[GE users] CPU usage by array jobs

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Fri Jun 29 10:20:50 BST 2007


On Thu, 28 Jun 2007, Ryan Thomas wrote:

> Give yourself some credit Dan. You're actually much better than you
> think.
>
> According to Andreas this was fixed in 6.0u11 as it was a duplicate of
> issue 2222.  So not only has it been fixed, but it's been shipped.

Ryan, in Germany we have a saying:

  "Du sollst den Tag nicht vor dem Abend loben"
  "You shall not loud the day before it is evening"

I encountered #2298 can not be a duplicate since #2222 was fixed also 
in 6.1 :-/

What are you using for

    weight_urgency
    weight_ticket
    weight_waiting_time

in sched_conf(5). If your waiting time weight is non-zero this could
cause the phenomenon you observe. Reason is that waiting time contributes
to job urgency and urgency has higher weight than the ticket policy.

Regards,
Andreas



>
> Thanks!
>
> -----Original Message-----
> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM]
> Sent: Thursday, June 28, 2007 2:06 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] CPU usage by array jobs
>
> Ryan,
>
> Do note that this issue was opened 11 days ago.  We're good, but we're
> not that good.  The priority it has is the priority with which it was
> submitted (which is the default priority).  As you can see from the lack
>
> of additional comment beyond the initial bug report, we haven't
> evaluated or assigned it yet.  During the last 11 days, we've been
> working on getting the 6.0u11 release out the door.
>
> Daniel
>
> Ryan Thomas wrote:
>> Someone issue #2298 to cover this,
>> (http://gridengine.sunsource.net/issues/show_bug.cgi?id=2298)
>>
>> But it seems that this hasn't been given a very high priority.
>>
>> I think that this is a major defect.  Array jobs dramatically increase
>> the scalability of the scheduler and they are also very convenient for
>> all my users.
>>
>> Perhaps if more people are a little more vocal about this being an
>> important issue it will get more attention.
>>
>> -----Original Message-----
>> From: Iwona Sakrejda [mailto:isakrejda at lbl.gov]
>> Sent: Wednesday, June 27, 2007 12:01 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] CPU usage by array jobs
>>
>> Has any of the experts looked at the following problem? I wonder if
> you
>> need more evedence and if yes of what kind? This problem is really
>> making
>> it impossible to get users stick with the array jobs....
>>
>> Thanks a lot,
>>
>> Iwona
>>
>>
>> Pascal Wassam wrote:
>>
>>> I just conducted a test run.
>>>
>>> My notes:
>>>
>>> 4 nodes totaling to 7 cpus on all.q, each node has 4 slots in queue
>>> config.
>>> SGE 6.1. All jobs are identical, cpuburn, set to run for 5 minutes.
>>>
>>> scheduler conf:
>>>
>>> policy_hierarchy OS
>>> weight_tickets_share 100000
>>>
>>> share tree:
>>>
>>> id=0
>>> name=template
>>> type=0
>>> shares=0
>>> childnodes=1
>>> id=1
>>> name=default
>>> type=0
>>> shares=100
>>> childnodes=NONE
>>>
>>> queue is disabled, and empty.
>>> 1000 individual jobs are queued as user pascal
>>> 1 array job of 1000 subjobs is queued as user ben
>>>
>>> usage is cleared (qconf -clearusage)
>>>
>>> at the starting line:
>>>
>>> Queued per user:
>>>   1000 pascal qw
>>>   1000 ben qw
>>>
>>> bang: qmod -e all.q
>>>
>>> 1 minute in:
>>>
>>> Running per user:
>>>      8 pascal r
>>>      8 ben r
>>> Queued per user:
>>>    992 pascal qw
>>>    992 ben qw
>>>
>>> A while later:
>>>
>>> Running per user:
>>>     10 pascal r
>>>      1 ben r
>>> Queued per user:
>>>    991 ben qw
>>>    973 pascal qw
>>>
>>> And it continues this way:
>>>
>>> Running per user:
>>>      8 pascal r
>>>      2 ben r
>>> Queued per user:
>>>    987 ben qw
>>>    952 pascal qw
>>>
>>> -Pascal
>>>
>>> Pascal Wassam wrote:
>>>
>>>> I would like to second all the experiences Iwona has written about
>>>> here. I will also attempt to conduct some tests and present
> something
>>>>
>>
>>
>>>> that is repeatable for developers to play with.
>>>>
>>>> -Pascal
>>>>
>>>> Iwona Sakrejda wrote:
>>>>
>>>>> Since this is a somehow different problem I gave it a new title.
>>>>>
>>>>> Rayson Ho wrote:
>>>>>
>>>>>>> Another problem I am having is that array jobs seem to be
>>>>>>>
>> overcharged
>>
>>>>>>> when the usage is calculated (could you point me to the section
> of
>>>>>>>
>>
>>
>>>>>>> code that
>>>>>>> deals with it/ I'll be happy to read it). Looks like each array
>>>>>>> job gets
>>>>>>> the CPU usage of the whole array. Array jobs are very helpful but
>
>>>>>>> users are
>>>>>>> fleeing from them in droves.....
>>>>>>>
>>>>>> How to reproduce it?? Is it a parallel or serial job??
>>>>>>
>>>>> It happens to serial jobs. I have not done thorough studies yet,
> but
>>>>>
>>
>>
>>>>> I see that
>>>>> usage for owners of array jobs greatly exceeds what I estimate it
>>>>> should be.
>>>>>
>>>>> Also when I clear usage, then only the usage from that moment
> should
>>>>>
>> be
>>
>>>>> taken into account - right? And I see that a user who has an array
>>>>> jobs, gets
>>>>> right away usage that exceeds what he has running at that point.
>>>>>
>>>>> Another shred of evidence is that when they switch from array jobs
>>>>>
>> to
>>
>>>>> individual jobs, they get a throughput that they feel is consistent
>
>>>>> with their share.
>>>>> If they use arrays their throughput dives.
>>>>>
>>>>> I'll try to come with a clean example with numbers.  It is in 6.0u4
>
>>>>> so since
>>>>> I have to upgrade anyway I was postponing more studies hoping that
>>>>> the upgrade will
>>>>> fix the problem. On the other hand it might not and it really
>>>>> increases  the load
>>>>> when instead of 1 array job with  1000 members I get 1000 jobs.....
>>>>>
>>>>> And today I noticed that discussion about shares and CPU
> consumption
>>>>>
>> so
>>
>>>>> I hoped the right expert might be watching and it would be easy for
>
>>>>> him to look at it...
>>>>>
>>>>> Iwona
>>>>>
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>>
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list