[GE users] CPU usage by array jobs

Pascal Wassam pascal at blur.compbio.ucsf.edu
Thu Jun 21 21:39:18 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I just conducted a test run.

My notes:

4 nodes totaling to 7 cpus on all.q, each node has 4 slots in queue config.
SGE 6.1. All jobs are identical, cpuburn, set to run for 5 minutes.

scheduler conf:

policy_hierarchy OS
weight_tickets_share 100000

share tree:

id=0
name=template
type=0
shares=0
childnodes=1
id=1
name=default
type=0
shares=100
childnodes=NONE

queue is disabled, and empty.
1000 individual jobs are queued as user pascal
1 array job of 1000 subjobs is queued as user ben

usage is cleared (qconf -clearusage)

at the starting line:

Queued per user:
   1000 pascal qw
   1000 ben qw

bang: qmod -e all.q

1 minute in:

Running per user:
      8 pascal r
      8 ben r
Queued per user:
    992 pascal qw
    992 ben qw

A while later:

Running per user:
     10 pascal r
      1 ben r
Queued per user:
    991 ben qw
    973 pascal qw

And it continues this way:

Running per user:
      8 pascal r
      2 ben r
Queued per user:
    987 ben qw
    952 pascal qw

-Pascal

Pascal Wassam wrote:
> I would like to second all the experiences Iwona has written about 
> here. I will also attempt to conduct some tests and present something 
> that is repeatable for developers to play with.
>
> -Pascal
>
> Iwona Sakrejda wrote:
>> Since this is a somehow different problem I gave it a new title.
>>
>> Rayson Ho wrote:
>>>> Another problem I am having is that array jobs seem to be overcharged
>>>> when the usage is calculated (could you point me to the section of 
>>>> code that
>>>> deals with it/ I'll be happy to read it). Looks like each array job 
>>>> gets
>>>> the CPU usage of the whole array. Array jobs are very helpful but 
>>>> users are
>>>> fleeing from them in droves.....
>>>
>>> How to reproduce it?? Is it a parallel or serial job??
>> It happens to serial jobs. I have not done thorough studies yet, but 
>> I see that
>> usage for owners of array jobs greatly exceeds what I estimate it 
>> should be.
>>
>> Also when I clear usage, then only the usage from that moment should be
>> taken into account - right? And I see that a user who has an array 
>> jobs, gets
>> right away usage that exceeds what he has running at that point.
>>
>> Another shred of evidence is that when they switch from array jobs to
>> individual jobs, they get a throughput that they feel is consistent 
>> with their share.
>> If they use arrays their throughput dives.
>>
>> I'll try to come with a clean example with numbers.  It is in 6.0u4 
>> so since
>> I have to upgrade anyway I was postponing more studies hoping that 
>> the upgrade will
>> fix the problem. On the other hand it might not and it really 
>> increases  the load
>> when instead of 1 array job with  1000 members I get 1000 jobs.....
>>
>> And today I noticed that discussion about shares and CPU consumption so
>> I hoped the right expert might be watching and it would be easy for 
>> him to look at it...
>>
>> Iwona
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list