[GE users] CPU usage by array jobs

Ryan Thomas Ryan.Thomas at nuance.com
Thu Jun 28 18:58:53 BST 2007


Someone issue #2298 to cover this,
(http://gridengine.sunsource.net/issues/show_bug.cgi?id=2298) 

But it seems that this hasn't been given a very high priority.  

I think that this is a major defect.  Array jobs dramatically increase
the scalability of the scheduler and they are also very convenient for
all my users.

Perhaps if more people are a little more vocal about this being an
important issue it will get more attention.
 
-----Original Message-----
From: Iwona Sakrejda [mailto:isakrejda at lbl.gov] 
Sent: Wednesday, June 27, 2007 12:01 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] CPU usage by array jobs

Has any of the experts looked at the following problem? I wonder if you
need more evedence and if yes of what kind? This problem is really
making
it impossible to get users stick with the array jobs....

Thanks a lot,

Iwona


Pascal Wassam wrote:
> I just conducted a test run.
>
> My notes:
>
> 4 nodes totaling to 7 cpus on all.q, each node has 4 slots in queue 
> config.
> SGE 6.1. All jobs are identical, cpuburn, set to run for 5 minutes.
>
> scheduler conf:
>
> policy_hierarchy OS
> weight_tickets_share 100000
>
> share tree:
>
> id=0
> name=template
> type=0
> shares=0
> childnodes=1
> id=1
> name=default
> type=0
> shares=100
> childnodes=NONE
>
> queue is disabled, and empty.
> 1000 individual jobs are queued as user pascal
> 1 array job of 1000 subjobs is queued as user ben
>
> usage is cleared (qconf -clearusage)
>
> at the starting line:
>
> Queued per user:
>   1000 pascal qw
>   1000 ben qw
>
> bang: qmod -e all.q
>
> 1 minute in:
>
> Running per user:
>      8 pascal r
>      8 ben r
> Queued per user:
>    992 pascal qw
>    992 ben qw
>
> A while later:
>
> Running per user:
>     10 pascal r
>      1 ben r
> Queued per user:
>    991 ben qw
>    973 pascal qw
>
> And it continues this way:
>
> Running per user:
>      8 pascal r
>      2 ben r
> Queued per user:
>    987 ben qw
>    952 pascal qw
>
> -Pascal
>
> Pascal Wassam wrote:
>> I would like to second all the experiences Iwona has written about 
>> here. I will also attempt to conduct some tests and present something

>> that is repeatable for developers to play with.
>>
>> -Pascal
>>
>> Iwona Sakrejda wrote:
>>> Since this is a somehow different problem I gave it a new title.
>>>
>>> Rayson Ho wrote:
>>>>> Another problem I am having is that array jobs seem to be
overcharged
>>>>> when the usage is calculated (could you point me to the section of

>>>>> code that
>>>>> deals with it/ I'll be happy to read it). Looks like each array 
>>>>> job gets
>>>>> the CPU usage of the whole array. Array jobs are very helpful but 
>>>>> users are
>>>>> fleeing from them in droves.....
>>>>
>>>> How to reproduce it?? Is it a parallel or serial job??
>>> It happens to serial jobs. I have not done thorough studies yet, but

>>> I see that
>>> usage for owners of array jobs greatly exceeds what I estimate it 
>>> should be.
>>>
>>> Also when I clear usage, then only the usage from that moment should
be
>>> taken into account - right? And I see that a user who has an array 
>>> jobs, gets
>>> right away usage that exceeds what he has running at that point.
>>>
>>> Another shred of evidence is that when they switch from array jobs
to
>>> individual jobs, they get a throughput that they feel is consistent 
>>> with their share.
>>> If they use arrays their throughput dives.
>>>
>>> I'll try to come with a clean example with numbers.  It is in 6.0u4 
>>> so since
>>> I have to upgrade anyway I was postponing more studies hoping that 
>>> the upgrade will
>>> fix the problem. On the other hand it might not and it really 
>>> increases  the load
>>> when instead of 1 array job with  1000 members I get 1000 jobs.....
>>>
>>> And today I noticed that discussion about shares and CPU consumption
so
>>> I hoped the right expert might be watching and it would be easy for 
>>> him to look at it...
>>>
>>> Iwona
>>>
>>>
---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list