[GE users] Jobs running but not using resources

Hugo Hernandez-Mora hugo.hernandez at loni.ucla.edu
Sat Nov 1 19:29:58 GMT 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti,
The SGE version we are using is 6.1u4.  You are right about the rule for the swap_total.  I have removed it because in really what we are looking is to prevent the users use the 100% of the memory on each compute node.  Now, regarding the output for the command qquota, here is what I have:


myhost> qquota -u "*"
resource quota rule limit                filter
--------------------------------------------------------------------------------
memory_usage/1     mem_total=7g         users {*} hosts {@v20zHosts}
memory_usage/2     mem_total=15g        users {*} hosts {@x2200Hosts}
max_per_queue/1    slots=7/672          users user1 queues short.q
max_per_queue/1    slots=26/672         users user2 queues short.q
max_per_queue/1    slots=1/672          users user3 queues short.q
max_per_queue/1    slots=46/672         users user4 queues short.q
max_per_queue/2    slots=2/192          users user5 queues medium.q
max_per_queue/2    slots=2/192          users user1 queues medium.q
max_per_queue/2    slots=1/192          users user6 queues medium.q
max_per_queue/3    slots=2/111          users user7 queues long.q
max_per_queue/3    slots=1/111          users user8 queues long.q
max_per_queue/3    slots=1/111          users user1 queues long.q
max_per_queue/3    slots=12/111         users user7 queues long.q
max_per_queue/3    slots=45/111         users user4 queues long.q
max_per_queue/4    slots=109/1810       users user0 queues queue0.q

user5 and user7 are running array jobs and them are the ones reporting very low CPU usage.

Thanks for your time and your help!!!!
-Hugo

--
Hugo R. Hernandez-Mora, M.Sc.
System Administrator
Laboratory of Neuro Imaging, UCLA
635 Charles E. Young Drive South, Suite 225
Los Angeles, CA 90095-7332
Tel: 310.267.5076
Fax: 310.206.5518
hugo.hernandez at loni.ucla.edu
--

"Si seus esfor?os, foram vistos com indefren?a, não desanime,
que o sol faze un espectacolo maravilhoso todas as manhãs
cuando a maior parte das pessoas, ainda estam durmindo"


-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Thursday, October 30, 2008 5:28 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Jobs running but not using resources

Hi,

Am 30.10.2008 um 22:51 schrieb Hugo Hernandez-Mora:

> Hello all,
> We are experiencing a strange behavior in our cluster since the
> last weekend.  Most of the jobs running into our cluster (we have
> +300 SunFire 20Vz and 80 SunFIre X2200 with +3,500 available slots)
> are not using the resources as expected.   Indeed, most of them are
> not using the resources (0 CPU for the associated processes).

which SGE version?

You mean jobs are scheduled put doing nothing? Or aren't the jobs
scheduled at all?

> We have set the following resource limits:
>
> {
>    name         memory_usage
>    description  Limit the memory used for all users (per machine type)
>    enabled      TRUE
>    limit        users {*} hosts {@v20zHosts} to mem_total=7g
>    limit        users {*} hosts {@x2200Hosts} to mem_total=15g
>    limit        users {*} to swap_total=10g

I'm puzzled about this last rule. Are you requesting swap_total for
the jobs? If one of the former rules allow execution of the job, the
follow-up rules won't be checked at all.

> }
> {
>    name         sysadm_rule
>    description  Restrict user user1 to use only 50 slots in
> queue0.q queue
>    enabled      TRUE
>    limit        users {user1} queues queue0.q to slots=50
> }
> {
>    name         max_per_queue
>    description  Limit the maximum allowed cluster queue slots per user
>    enabled      TRUE
>    limit        users {*} queues short.q to slots=672
>    limit        users {*} queues medium.q to slots=192
>    limit        users {*} queues long.q to slots=111
>    limit        users {*} queues special.q to slots=1810
> }
>
> For the last limit, the max_per_queue, we are restricting the users
> to use all the available slots on the queues, preventing to
> monopolize the resources of the cluster.   The total of available
> slots per queue is:
>
> myhost> qstat -g c
> CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL aoACDS
> cdsuE
> ----------------------------------------------------------------------
> ---------
> long.q                            0.48    185      0    240
> 41     32
> medium.q                          0.48      5     59    330
> 230     40
> special.q                        0.57    134   1741   2190
> 10    325
> short.q                           0.48    986      4   1140
> 24    142
> queue0.q                          3.14    185      0    185
> 185      0
>
> we have not done any changes on our configuration.  Does any of you
> have experienced a similar problems or can you just give me some
> hints about what to check?  Any help will be greatly appreciated.
> Thanks in advance,

Is there any helpful output in the command:

$ qquota -u "*"

BTW: Giving the rules names might make the output easier to read.

-- Reuti


>
> -Hugo
>
> --
> Hugo R. Hernandez-Mora
> System Administrator
> Laboratory of Neuro Imaging, UCLA
> 635 Charles E. Young Drive South, Suite 225
> Los Angeles, CA 90095-7332
> Tel: 310.267.5076
> Fax: 310.206.5518
> hugo.hernandez at loni.ucla.edu
> --
>
> "Si seus esfor?os, foram vistos com indefren?a, não desanime,
> que o sol faze un espectacolo maravilhoso todas as manhãs
> cuando a maior parte das pessoas, ainda estam durmindo"
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list