[GE users] Job suspended every 60 seconds

Patrice Hamelin phamelin at clumeq.mcgill.ca
Thu Mar 10 13:14:51 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti,

   At first I was killing all the user's processes, but there was a 
problem with that.  The suspend script is running as the user ID itself, 
and not as sgeadmin, as I first thought.  The result was that the shell 
running the kill's was killing itself, heading to unwanted result.

   I tested my suspension script with two different MPI codes, one that 
is doing only communication, and another one that computes Jacobi 
integration in parallel.  The main problem in getting the PIDs it that I 
really have to target those two processes that are runnning the users 
code and eating 99% or so of the CPU each. I have to suspend only those 
two processes.  I verified that the processes are in T state after the 
kill -19 command was sent to them.

   I agree that there may be weird results doing that kind of operation 
on an MPICH job, and I will also test the queue priority 19 that you 
mentionned.  It looks promising!

Ciao!

Reuti wrote:
> Hi Patrice,
> 
> I really think, it's not a good idea to suspend a MPICH-GM job. IMO the easier 
> solution would be to have a special cluster queue with a priority of 19 for 
> them. So any other job running on the nodes in another queue with a priority of 
> 0 will get most of the CPU time.
> 
> But anyway, if I understand your script in the correct way: you want to suspend 
> all jobs from a user on a node by selecting him/her by the $LOGNAME in top? So 
> the user's name may not appear in any other field at all, and only one job per 
> user per node is the limitation. And: head -2 will list the first two lines, at 
> least I get only two blank lines with it (which platform/OS are you using?).
> 
> Whether you decide to use it, or get a special cluster queue for MPICH-GM: 
> better suited is the ps command, because there you can specify a user and 
> output format, hence the complete:
> 
> top -b -n1 | grep $LOGNAME | head -2 | awk '{print $1}'
> 
> can be:
> 
> ps --user $LOGNAME -o pid --no-headers
> 
> 
> Next enhancement is not to stop each process on its own, but the whole process 
> group (if you have a tight integration according to the Howto for MPICH, which 
> also has a hint for MPICH-GM) [of course: it's untested]:
> 
> for proc in `rsh $nodes ps --user $LOGNAME -o pgrp --no-headers | uniq` ; do
>     rsh $nodes kill -19 -- -$proc
> done
> 
> If there is only one job on the node, you wouldn't need the loop at all now. 
> Did you verified on the nodes, that your job is really suspended with your 
> script by looking in the e.g. "ps -e f" output for the field STAT which will 
> show T for stopped jobs?
> 
> 
> Cheers - Reuti
> 
> 
> Quoting Patrice Hamelin <phamelin at clumeq.mcgill.ca>:
> 
> 
>>Reuti,
>>
>>   Thanks for your answer, and sorry not to give enough details.  I am 
>>using GE 6.0u1 with MPICH-GM implementation. I found nothing interesting 
>>in the qmaster message file.  My script simply send SIGSTOP signals to 
>>all the MPI processes on all nodes members of the job.  I tested it with 
>>a simple communication program, but I still have to test it in a real 
>>production environment, next step.  You will find my cript below.  The 
>>"unsuspend" script simply send SIGCONT signal to the processes.
>>
>>   I fooled the re-suspension by creating a file at the first suspension.
>>
>>F=/tmp/suspend_MPI_job.$LOGNAME.log
>>touch $F
>>
>>if [ -f $TMPDIR/suspended ];then
>>   echo "`date` Job already suspended; exiting" >> $F
>>   exit
>>fi
>>#
>># For each node
>>#
>>   for nodes in `cat $TMPDIR/machines | /usr/bin/uniq`
>>   do
>>#
>># Create a file that contains PIDs of suspended processes
>>#
>>     touch $TMPDIR/$nodes
>>     > $TMPDIR/$nodes
>>#
>># Determine processes to suspend
>>#
>>     for proc in `rsh $nodes top -b -n1 | grep $LOGNAME | head -2 | awk 
>>'{print $1}'`
>>     do
>>       echo "`date` Suspending process $proc on $nodes" >> $F
>>       echo $proc >> $TMPDIR/$nodes
>>       rsh $nodes kill -19 $proc
>>     done
>>   done
>>touch $TMPDIR/suspended
>>
>>
>>Reuti wrote:
>>
>>>Quoting Patrice Hamelin <phamelin at clumeq.mcgill.ca>:
>>>
>>>
>>>
>>>>  I wrote a script to suspend MPI job in a queue and include that path 
>>>>in the "suspend method" field of the queue configuration.  My problem is 
>>>>that SGE keeps trying to suspend the job again every minutes, even 
>>>>though I setup my queue like:
>>>>
>>>>suspend_thresholds    load_avg=3.0
>>>>nsuspend              0
>>>>suspend_interval      INFINITY
>>>
>>>
>>>It may not be easy to suspend a MPI job at all (which MPI implementation?),
>>
>>>because of possible timeouts in the communication. What are you doing in
>>
>>your 
>>
>>>script exactly, which version of SGE and: are there any entries in the
>>
>>messages 
>>
>>>files of the qmaster and/or execd? Just 60 seconds it's just like the
>>
>>deafult 
>>
>>>notify time - how did you submitted your job? - Reuti
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>-- 
>>Patrice Hamelin ing, M.Sc.A, CCNA
>>Systems Administrator
>>CLUMEQ Supercomputer Centre
>>McGill University
>>688 Sherbrooke Street West, Suite 710
>>Montreal, QC, Canada H3A 2S6
>>Tel: 514-398-3344
>>Fax: 514-398-2203
>>http://www.clumeq.mcgill.ca
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
Patrice Hamelin ing, M.Sc.A, CCNA
Systems Administrator
CLUMEQ Supercomputer Centre
McGill University
688 Sherbrooke Street West, Suite 710
Montreal, QC, Canada H3A 2S6
Tel: 514-398-3344
Fax: 514-398-2203
http://www.clumeq.mcgill.ca

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list