[GE users] CPU limit in mpi jobs

Rui Ramos rramos at iric.up.pt
Mon Jun 12 09:53:36 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


 Hi, thanks on the quick reply

 I've sent as an attach the output of qacct -j of a finished lam/mpi job.

 I can see that for each instance i get:

 "cpu          0"         

 I read the thread on the bug report.

 "...SGE still keeps this job, as it never realises, that the h_cpu 
  limit of the kernel-setrlimit was reached by one process. On the head 
  node of the parallel job,  the job script already exited and isn't 
  any longer in the process tree. So the desired behavior could be 
  to kill all slave tasks, if the main script already finished."

 Quite interesting.
  - I think that most of this problems could be solve if instead of calling a kernel set_limit function it could call a method to the sge_execd daemon that stores the timelimit for that process, and if reached then send a kill that process and a response to the scheculer. If this is possible or not, i don't know.   

                                                      Any thoughts, on solving this issue

 But i don't quite know the way MPI comunication work but if there is a slave task that's looping for until receives 

On Fri, 9 Jun 2006 23:13:59 +0200
Reuti <reuti at staff.uni-marburg.de> wrote:

> Hi again,
> 
> the CPU limit is working in principle, but for now there is a  
> possible race condition:
> 
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=1960
> 
> The job will disappear, but some slaves keep on running.
> 
> 
> To the usage: the usage of a parallel job is working for me. Can you  
> try after a normal finished job:
> 
> qacct -j <jobid>
> 
> which should show also one entry for each qrsh call.
> 
> -- Reuti
> 
> 
> Am 08.06.2006 um 20:22 schrieb Rui Ramos:
> 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> >
> >  Hi all,
> >
> >  Well i've tried setting the tight mpi integration with no luck.  
> > I've also follow the LAM/MPI integration and set it with tight  
> > integration. It seems to work for some jobs still have to make some  
> > more tests. Anyway ! the cpu limit of the mpi jobs is allways  
> > 00:00:00.
> >
> >  Does anybody have CPU limits working with mpi jobs ?
> >
> >                                                    Apreciate any  
> > help :)
> >
> > PS: Yes is a tight integration of the LAM/MPI like explained in  
> > Reuti howto.
> >
> > On Fri, 2 Jun 2006 16:55:05 +0100
> > Rui Ramos <rramos at iric.up.pt> wrote:
> >
> >>
> >>  Well i guess i don't have the tight integration. I'm reading your  
> >> howto and the symptoms are the ones referenced.
> >>
> >>     http://gridengine.sunsource.net/howto/mpich-integration.html
> >>
> >>                                                                   
> >> Regards, going to try it out
> >>
> >> On Fri, 2 Jun 2006 17:40:50 +0200
> >> Reuti <reuti at staff.uni-marburg.de> wrote:
> >>
> >>> Hi,
> >>>
> >>> Am 02.06.2006 um 17:39 schrieb Rui Ramos:
> >>>
> >>>>
> >>>>  Hi all,
> >>>>
> >>>>  I've set CPU limits in some of my queues. But there is something
> >>>> that worries me. When submitting an mpi job this CPU limit, is set
> >>>> to each mpi instance or to the sum of the all instances ?
> >>>>  Another thing is when doing a qstat i get
> >>>>
> >>>> usage    1:                 cpu=00:00:00, mem=0.00050 GBs,
> >>>> io=0.00000, vmem=121.828M, maxvmem=121.828M
> >>>>
> >>>>  And the cpu time is allways 00:00:00. Is the CPU limit really
> >>>> working with mpi jobs ?
> >>>
> >>> is it a Tightly Integrated setup?- Reuti
> >>>
> >>>>                                                    thanks in  
> >>>> advance
> >>>>
> >> -- 
> >> ============================================
> >>  Rui Manuel dos Santos Ramos
> >>
> >>  Instituto de Recursos e Iniciativas Comuns
> >>  Pra_a Gomes Teixeira, 4099-002 Porto, Portugal
> >>
> >>  phone : +351 223 401 571
> >>  e-mail: rramos[at]iric.up.pt
> >>     web: http://ruiramos.homeip.net
> >> ============================================
> >>
> >>
> >
> >
> > - --
> > ============================================
> >  Rui Manuel dos Santos Ramos
> >
> >  Instituto de Recursos e Iniciativas Comuns
> >  Praca Gomes Teixeira, 4099-002 Porto, Portugal
> >
> >  phone : +351 223 401 571
> >  e-mail: rramos[at]iric.up.pt
> >     web: http://ruiramos.homeip.net
> > ============================================
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.2.2 (GNU/Linux)
> >
> > iQEVAwUBRIhqz71uR0bdnTWSAQIA3Qf/Xhh3qXS+tDaGNY4Jb3p7a1dBbiYeBk11
> > qPDCrX31GxNndfE5H6TWrIZbXwk1eCQQud8eShOyFeEWJYx95J43uE46NL5L7rqZ
> > IXh2ZgqyaB+aG8AUU3Q/B/TItZz3TfiJmyAQHFVPn1+chQtnGKbloOnk+Cf11Cp+
> > u0bPe/hfeyRsTVP4UPGwCFO4B0Q9buanvPvwwvyPi2VNL6pINLc6ym54hQTubDqP
> > 3pxzKCzCvs3BkFk3NpzQIXpNPRkEnFaQSXiDZi/5K4mEBhbi9PvJNfS6zej7NlTW
> > dGSqcyMSgn3prjVF2RFpRrXWh2OsMndgt8sxkQ5KSQGIg4wCdpD+dQ==
> > =q47k
> > -----END PGP SIGNATURE-----
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


- -- 
============================================
 Rui Manuel dos Santos Ramos

 Instituto de Recursos e Iniciativas Comuns
 Praca Gomes Teixeira, 4099-002 Porto, Portugal

 phone : +351 223 401 571
 e-mail: rramos[at]iric.up.pt
    web: http://ruiramos.homeip.net
============================================

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iQEVAwUBRI0rkL1uR0bdnTWSAQKDSwf/Swg6msa87bds07Xg4XZHOhWB0KaLgYA4
NDCvBm+GrUMaaoQZBN1D+6PL0x7DbJkHjUYq0tUj6rySoP6zPdSHUeDZnF3yX+lh
voCS+PWiQYDX3oHdze3SnikckjCRQDCPojZv+IyhjoPF4UpGjP4hGmfjtlHNCCOb
FQQCVDWb3EA1N2qkW7sjTerHhExFBCoCXya27W3CY2TeMZPn3W2qNBTiyV5rNH9a
NZWv4JelzTHkAyEAo1/Jt1ivbDTwGXN0NywuRQ9sqwPW/TU5sJuTgRxAN4krToYk
GMgGVJcu9r8SmTEfYfIFQICV484Cm+Rm2lRkzef7MrjTyRRQweEQLA==
=zfYF
-----END PGP SIGNATURE-----


    [ Part 2, Text/PLAIN (Name: "out.txt") 338 lines. ]
    [ Unable to print this part. ]


    [ Part 3: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list