[GE users] CPU usage by array jobs

Daniel Templeton Dan.Templeton at Sun.COM
Thu Jun 28 19:05:45 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Ryan,

Do note that this issue was opened 11 days ago.  We're good, but we're 
not that good.  The priority it has is the priority with which it was 
submitted (which is the default priority).  As you can see from the lack 
of additional comment beyond the initial bug report, we haven't 
evaluated or assigned it yet.  During the last 11 days, we've been 
working on getting the 6.0u11 release out the door.

Daniel

Ryan Thomas wrote:
> Someone issue #2298 to cover this,
> (http://gridengine.sunsource.net/issues/show_bug.cgi?id=2298) 
>
> But it seems that this hasn't been given a very high priority.  
>
> I think that this is a major defect.  Array jobs dramatically increase
> the scalability of the scheduler and they are also very convenient for
> all my users.
>
> Perhaps if more people are a little more vocal about this being an
> important issue it will get more attention.
>  
> -----Original Message-----
> From: Iwona Sakrejda [mailto:isakrejda at lbl.gov] 
> Sent: Wednesday, June 27, 2007 12:01 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] CPU usage by array jobs
>
> Has any of the experts looked at the following problem? I wonder if you
> need more evedence and if yes of what kind? This problem is really
> making
> it impossible to get users stick with the array jobs....
>
> Thanks a lot,
>
> Iwona
>
>
> Pascal Wassam wrote:
>   
>> I just conducted a test run.
>>
>> My notes:
>>
>> 4 nodes totaling to 7 cpus on all.q, each node has 4 slots in queue 
>> config.
>> SGE 6.1. All jobs are identical, cpuburn, set to run for 5 minutes.
>>
>> scheduler conf:
>>
>> policy_hierarchy OS
>> weight_tickets_share 100000
>>
>> share tree:
>>
>> id=0
>> name=template
>> type=0
>> shares=0
>> childnodes=1
>> id=1
>> name=default
>> type=0
>> shares=100
>> childnodes=NONE
>>
>> queue is disabled, and empty.
>> 1000 individual jobs are queued as user pascal
>> 1 array job of 1000 subjobs is queued as user ben
>>
>> usage is cleared (qconf -clearusage)
>>
>> at the starting line:
>>
>> Queued per user:
>>   1000 pascal qw
>>   1000 ben qw
>>
>> bang: qmod -e all.q
>>
>> 1 minute in:
>>
>> Running per user:
>>      8 pascal r
>>      8 ben r
>> Queued per user:
>>    992 pascal qw
>>    992 ben qw
>>
>> A while later:
>>
>> Running per user:
>>     10 pascal r
>>      1 ben r
>> Queued per user:
>>    991 ben qw
>>    973 pascal qw
>>
>> And it continues this way:
>>
>> Running per user:
>>      8 pascal r
>>      2 ben r
>> Queued per user:
>>    987 ben qw
>>    952 pascal qw
>>
>> -Pascal
>>
>> Pascal Wassam wrote:
>>     
>>> I would like to second all the experiences Iwona has written about 
>>> here. I will also attempt to conduct some tests and present something
>>>       
>
>   
>>> that is repeatable for developers to play with.
>>>
>>> -Pascal
>>>
>>> Iwona Sakrejda wrote:
>>>       
>>>> Since this is a somehow different problem I gave it a new title.
>>>>
>>>> Rayson Ho wrote:
>>>>         
>>>>>> Another problem I am having is that array jobs seem to be
>>>>>>             
> overcharged
>   
>>>>>> when the usage is calculated (could you point me to the section of
>>>>>>             
>
>   
>>>>>> code that
>>>>>> deals with it/ I'll be happy to read it). Looks like each array 
>>>>>> job gets
>>>>>> the CPU usage of the whole array. Array jobs are very helpful but 
>>>>>> users are
>>>>>> fleeing from them in droves.....
>>>>>>             
>>>>> How to reproduce it?? Is it a parallel or serial job??
>>>>>           
>>>> It happens to serial jobs. I have not done thorough studies yet, but
>>>>         
>
>   
>>>> I see that
>>>> usage for owners of array jobs greatly exceeds what I estimate it 
>>>> should be.
>>>>
>>>> Also when I clear usage, then only the usage from that moment should
>>>>         
> be
>   
>>>> taken into account - right? And I see that a user who has an array 
>>>> jobs, gets
>>>> right away usage that exceeds what he has running at that point.
>>>>
>>>> Another shred of evidence is that when they switch from array jobs
>>>>         
> to
>   
>>>> individual jobs, they get a throughput that they feel is consistent 
>>>> with their share.
>>>> If they use arrays their throughput dives.
>>>>
>>>> I'll try to come with a clean example with numbers.  It is in 6.0u4 
>>>> so since
>>>> I have to upgrade anyway I was postponing more studies hoping that 
>>>> the upgrade will
>>>> fix the problem. On the other hand it might not and it really 
>>>> increases  the load
>>>> when instead of 1 array job with  1000 members I get 1000 jobs.....
>>>>
>>>> And today I noticed that discussion about shares and CPU consumption
>>>>         
> so
>   
>>>> I hoped the right expert might be watching and it would be easy for 
>>>> him to look at it...
>>>>
>>>> Iwona
>>>>
>>>>
>>>>         
> ---------------------------------------------------------------------
>   
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list