[GE users] some startup questions....

templedf dan.templeton at sun.com
Tue Mar 17 14:16:22 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

cjf001 wrote:
> Daniel, and anyone else that wants to jump in here -
>
> thanks for the info.  I'm downloading the beginner's guide that you pointed
> me to right now, and will read through it later today.  Also thanks
> for the picture - that helps, I think !
>
> So, based on the picture and your responses, I'd like to clarify a
> couple of things with you, if I could....
>
> 1 - so when a job is submitted, it's "waiting" until a suitable
>     queue is found, and when it's placed in the queue, it's
>     running - is that right ?
>
>     (this is probably the most confusing part for me - a "queue" is
>     usually a place to wait for something to happen - but in SGE,
>     it appears that a "queue" is a place where something really IS
>     happening - ie, the job is running - and while the jobs are
>     waiting to be placed in a queue, they are all in a waiting
>     state)
>   

You got it.

> 2 - if I got #1 right, then is there a way to tell which of the
>     jobs that are waiting to be assigned a queue will be done
>     next ?  In other words, is there a priority to the waiting
>     jobs, and is there a way to view that ? Is that where the
>     tickets come in ?
>   

By default, qstat sorts the job in priority order.  Depending on what 
scheduling policies you have active, there are various switches to qstat 
(-ext, -urg, -pri, etc.) to show the numbers behind that sort order.

> 3 - it sounds like the only way to get the preemption, or job
>     suspending, is to use subordinate queues.
>   

Preemption only comes from queue subordination.  Job suspension can come 
from several places, including manual intervention, calendar-based queue 
suspension, a queue crossing its suspend threshold, etc.

Daniel

> thanks again -
>
>      John
>
>
>
> templedf wrote:
>
>   
>> cjf001 wrote:
>>
>>     
>>> SGE Users -
>>>
>>> new SGE admin here, coming up to speed on a system that I've inherited
>>>       
>> >from a guy that "moved on".....  Nice thing is that those that are left
>>     
>>> are somewhat confused about how the system is set up, so I will probably
>>> be able to just configure it the way I want, in conjunction with a
>>> version upgrade (to 6.2)....
>>>
>>> I've read the user's manual, admin manual, install manual, and a few of
>>> the other things on the website, including the very-helpful
>>> "SCHEDULER POLICIES FOR JOB PRIORITIZATION IN THE SUN N1? GRID ENGINE 6 SYSTEM"
>>> whitepaper by Charu Chaubal. I've played around with some of the commands
>>> on the existing system.  But, I have some questions, some probably stupid,
>>> so be nice.... :)
>>>
>>> 1) what the heck does the "N1" in "N1 Grid Engine" mean ?!
>>>  
>>>       
>> Don't ask.  It's gone now, so let's forget that it ever existed.
>>
>>
>>     
>>> (BTW, in the following questions, I'm talking about CLUSTER queues unless
>>>  I specifically say otherwise, which I never do....)
>>>
>>> 2) I'm confused about the states of a job.  When it's submitted, using
>>>    qsub, is the job immediately and always sent to a queue ? If not,
>>>    where is it, and how would I see it ?
>>>  
>>>       
>> When a job is in the queued and waiting state (qw), it is still in the 
>> pending job list waiting to be assigned to a queue.  When a job is in 
>> the running state (r), it is assigned to a queue.  I don't see how 
>> that's confusing.  ;)  qstat will show you jobs in both (all) states.
>>
>>
>>     
>>> 3) This question kind of depends on the answer to the one above, but I'll
>>>    ask it anyway... when a job is in a queue, does that mean it's running ?
>>>    If not, which I assume is the answer, then can more than one job in a
>>>    queue be running at the same time ?
>>>
>>> 4) The jobs in a queue are re-prioritized at each scheduling interval, correct ?
>>>    So it's possible that a job that's not running (in a queue) could all of
>>>    a sudden get a higher priority (say due to some override tickets assigned
>>>    to it) than a running job, and so the running job is suspended - is that
>>>    how it works ?
>>>  
>>>       
>> Nope.  SGE is not natively preemptive.  Once a job is scheduled to a 
>> queue, it runs to completion, unless it fails or is canceled.  The 
>> exception to that rule is queue subordination, which introduces a sort 
>> of aftermarket preemption.
>>
>>
>>     
>>> 5) somewhat related to the previous question, maybe, but in Charu's whitepaper
>>>    he talks about a "dispatch priority" - is this something different than
>>>    the priority of the jobs in a queue ?
>>>  
>>>       
>> I'd have to go read the paper again to know what he meant.  There's only 
>> one priority that's relevant, and that's the priority the job is 
>> assigned while waiting to be scheduled.
>>
>>
>>     
>>> 6) I'm searching for a "good" way to visualize in my mind, if not on paper,
>>>    what the SGE queueing system looks like - does anyone have such a thing ?
>>>    For instance, can a queue be represented by a vertical tube, where jobs are
>>>    dropped into the top, and come out the bottom when they are ready to be
>>>    run ?  (probably not, eh ?!) Or do they not come out of the tube until their
>>>    run is completed, and more than one can be running at once ? (getting back
>>>    to a previous question)
>>>  
>>>       
>> See the attached slide.
>>
>>
>>     
>>> and now for something that has nothing to do with queues, I think -
>>>
>>> 7) how do you handle clusters that are made up of many types of machines, some
>>>    of which are quad-core, some of which dual-core, and some single-core ? If
>>>    a job only requires a single core, does that mean that SGE can/will submit
>>>    4 separate jobs to a quad-core machine ?
>>>  
>>>       
>> Yep.  SGE schedules jobs to job slots.  You can assign how ever many 
>> jobs slots to a machine you'd like.  The default assumption is that 
>> slots = cores.
>>
>>
>>     
>>>      Thanks for the help !!!
>>>
>>>         John
>>>
>>>  
>>>       
>> You should also have a look at 
>> http://www.sun.com/offers/details/Sun_Grid_Engine_62_install_and_config.html
>>
>> Daniel
>>
>>
>>
>>     
>>>
>>>
>>>       
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=133644
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>     
>
>
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=134331

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list