[GE users] Job suspended every 60 seconds

Patrice Hamelin phamelin at clumeq.mcgill.ca
Tue Mar 15 15:33:54 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Stephan,

  Thanks for your answer.  I have this in qconf -mconf

reprioritize                 0




Stephan Grell - Sun Germany - SSG - Software Engineer wrote:

> You should also set the
>
> reprioritze false
>
> setting in the cluster configuration (qconf -mconf)
>
> Stephan
>
> Reuti wrote:
>
>> Mmh, the reprioritize_interval in the scheduler is set to 0:0:0, so 
>> that SGE is not changing the priority on it's own? For me it's 
>> working and I get a nice of 19 on all slave nodes of a parallel job. 
>> Are some jobs going to the wrong queue?
>>
>> My idea was not to renice the job during their execution, but start 
>> them already with 19 on all nodes. When their is only one job, they 
>> will get the most of the CPU time anyway also with 19.
>>
>> Cheers - Reuti
>>
>> Patrice Hamelin wrote:
>>
>>> Reuti,
>>>
>>>   I tried to renice the processes with the Priority of the queues 
>>> and it shares the processors between processes (66%-33% ratio).  I 
>>> also tried to renice the parallel job to the lower 19 priority, but 
>>> it is not propagating to the slave nodes.  Only the master node 
>>> processes have a higher scheduling priority.
>>>
>>>   I think I will stick to my suspension scripts since I really need 
>>> the higher priority queue to get ALL the processor whenever they 
>>> need it.  I would warn the users of the lower priority queue that 
>>> the result can be bad if suspension occurs.
>>>
>>> Thanks.
>>>
>>> Patrice Hamelin wrote:
>>>
>>>> Reuti,
>>>>
>>>>   At first I was killing all the user's processes, but there was a 
>>>> problem with that.  The suspend script is running as the user ID 
>>>> itself, and not as sgeadmin, as I first thought.  The result was 
>>>> that the shell running the kill's was killing itself, heading to 
>>>> unwanted result.
>>>>
>>>>   I tested my suspension script with two different MPI codes, one 
>>>> that is doing only communication, and another one that computes 
>>>> Jacobi integration in parallel.  The main problem in getting the 
>>>> PIDs it that I really have to target those two processes that are 
>>>> runnning the users code and eating 99% or so of the CPU each. I 
>>>> have to suspend only those two processes.  I verified that the 
>>>> processes are in T state after the kill -19 command was sent to them.
>>>>
>>>>   I agree that there may be weird results doing that kind of 
>>>> operation on an MPICH job, and I will also test the queue priority 
>>>> 19 that you mentionned.  It looks promising!
>>>>
>>>> Ciao!
>>>>
>>>> Reuti wrote:
>>>>
>>>>> Hi Patrice,
>>>>>
>>>>> I really think, it's not a good idea to suspend a MPICH-GM job. 
>>>>> IMO the easier solution would be to have a special cluster queue 
>>>>> with a priority of 19 for them. So any other job running on the 
>>>>> nodes in another queue with a priority of 0 will get most of the 
>>>>> CPU time.
>>>>>
>>>>> But anyway, if I understand your script in the correct way: you 
>>>>> want to suspend all jobs from a user on a node by selecting 
>>>>> him/her by the $LOGNAME in top? So the user's name may not appear 
>>>>> in any other field at all, and only one job per user per node is 
>>>>> the limitation. And: head -2 will list the first two lines, at 
>>>>> least I get only two blank lines with it (which platform/OS are 
>>>>> you using?).
>>>>>
>>>>> Whether you decide to use it, or get a special cluster queue for 
>>>>> MPICH-GM: better suited is the ps command, because there you can 
>>>>> specify a user and output format, hence the complete:
>>>>>
>>>>> top -b -n1 | grep $LOGNAME | head -2 | awk '{print $1}'
>>>>>
>>>>> can be:
>>>>>
>>>>> ps --user $LOGNAME -o pid --no-headers
>>>>>
>>>>>
>>>>> Next enhancement is not to stop each process on its own, but the 
>>>>> whole process group (if you have a tight integration according to 
>>>>> the Howto for MPICH, which also has a hint for MPICH-GM) [of 
>>>>> course: it's untested]:
>>>>>
>>>>> for proc in `rsh $nodes ps --user $LOGNAME -o pgrp --no-headers | 
>>>>> uniq` ; do
>>>>>     rsh $nodes kill -19 -- -$proc
>>>>> done
>>>>>
>>>>> If there is only one job on the node, you wouldn't need the loop 
>>>>> at all now. Did you verified on the nodes, that your job is really 
>>>>> suspended with your script by looking in the e.g. "ps -e f" output 
>>>>> for the field STAT which will show T for stopped jobs?
>>>>>
>>>>>
>>>>> Cheers - Reuti
>>>>>
>>>>>
>>>>> Quoting Patrice Hamelin <phamelin at clumeq.mcgill.ca>:
>>>>>
>>>>>
>>>>>> Reuti,
>>>>>>
>>>>>>   Thanks for your answer, and sorry not to give enough details.  
>>>>>> I am using GE 6.0u1 with MPICH-GM implementation. I found nothing 
>>>>>> interesting in the qmaster message file.  My script simply send 
>>>>>> SIGSTOP signals to all the MPI processes on all nodes members of 
>>>>>> the job.  I tested it with a simple communication program, but I 
>>>>>> still have to test it in a real production environment, next 
>>>>>> step.  You will find my cript below.  The "unsuspend" script 
>>>>>> simply send SIGCONT signal to the processes.
>>>>>>
>>>>>>   I fooled the re-suspension by creating a file at the first 
>>>>>> suspension.
>>>>>>
>>>>>> F=/tmp/suspend_MPI_job.$LOGNAME.log
>>>>>> touch $F
>>>>>>
>>>>>> if [ -f $TMPDIR/suspended ];then
>>>>>>   echo "`date` Job already suspended; exiting" >> $F
>>>>>>   exit
>>>>>> fi
>>>>>> #
>>>>>> # For each node
>>>>>> #
>>>>>>   for nodes in `cat $TMPDIR/machines | /usr/bin/uniq`
>>>>>>   do
>>>>>> #
>>>>>> # Create a file that contains PIDs of suspended processes
>>>>>> #
>>>>>>     touch $TMPDIR/$nodes
>>>>>>     > $TMPDIR/$nodes
>>>>>> #
>>>>>> # Determine processes to suspend
>>>>>> #
>>>>>>     for proc in `rsh $nodes top -b -n1 | grep $LOGNAME | head -2 
>>>>>> | awk '{print $1}'`
>>>>>>     do
>>>>>>       echo "`date` Suspending process $proc on $nodes" >> $F
>>>>>>       echo $proc >> $TMPDIR/$nodes
>>>>>>       rsh $nodes kill -19 $proc
>>>>>>     done
>>>>>>   done
>>>>>> touch $TMPDIR/suspended
>>>>>>
>>>>>>
>>>>>> Reuti wrote:
>>>>>>
>>>>>>> Quoting Patrice Hamelin <phamelin at clumeq.mcgill.ca>:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>  I wrote a script to suspend MPI job in a queue and include 
>>>>>>>> that path in the "suspend method" field of the queue 
>>>>>>>> configuration.  My problem is that SGE keeps trying to suspend 
>>>>>>>> the job again every minutes, even though I setup my queue like:
>>>>>>>>
>>>>>>>> suspend_thresholds    load_avg=3.0
>>>>>>>> nsuspend              0
>>>>>>>> suspend_interval      INFINITY
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It may not be easy to suspend a MPI job at all (which MPI 
>>>>>>> implementation?),
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> because of possible timeouts in the communication. What are you 
>>>>>>> doing in
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> your
>>>>>>
>>>>>>> script exactly, which version of SGE and: are there any entries 
>>>>>>> in the
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> messages
>>>>>>
>>>>>>> files of the qmaster and/or execd? Just 60 seconds it's just 
>>>>>>> like the
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> deafult
>>>>>>
>>>>>>> notify time - how did you submitted your job? - Reuti
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: 
>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Patrice Hamelin ing, M.Sc.A, CCNA
>>>>>> Systems Administrator
>>>>>> CLUMEQ Supercomputer Centre
>>>>>> McGill University
>>>>>> 688 Sherbrooke Street West, Suite 710
>>>>>> Montreal, QC, Canada H3A 2S6
>>>>>> Tel: 514-398-3344
>>>>>> Fax: 514-398-2203
>>>>>> http://www.clumeq.mcgill.ca
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list