[GE users] policy conflict ?

reuti reuti at staff.uni-marburg.de
Sat Sep 5 00:15:08 BST 2009


Am 05.09.2009 um 01:05 schrieb cjf001:

> Well, maybe....
>
> I already have different sequence numbers on the queues.
>
> How would your "b" suggestion "change the equation" ?

Once a PE is selected, you will get slots only from this one. Hence  
all will be primary or all will be secondary, but no mix. So it can't  
happen any longer, that a job would get two slots on one and the same  
host but from different queues.

Usually I avoid to attach the same PE to different queues for another  
reason: because the $TMPDIR will also have different names then as  
the queuename is part of the $TMPDIR name. Many parallel applications  
use the name which the master process of the parallel job got also  
for the spread slave processes, and when the slaves can't find it on  
their node, they crash.

-- Reuti


> Why is the scheduler apparently "giving up" and skipping the
> last few jobs ?  It is running out of time, or does it figure
> that it will just keep running into errors ?
>
>
>     Thanks !
>
>       John
>
>
> reuti wrote:
>
>> Am 05.09.2009 um 00:43 schrieb cjf001:
>>
>>
>>> Yes, at least some of them are. Is that a problem ? What
>>> policy is this in conflict with ?
>>
>>
>> SGE is just collecting slots from available queues - then it
>> discovers that the job would block itself. What you can try is:
>>
>> a) use different sequence numbers for the primary and secondary queue
>>
>> b) duplicate the PE and name it mpich1 and mpich2 or alike, attach
>> each to one and only one queue and request mpich* as pe
>>
>> HTH - Reuti
>>
>>
>>
>>>
>>> I found some info about this at :
>>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=437
>>>
>>> Unfortunately, this says :
>>>   "Extensive discussion of the topic can be found under
>>>   http://gridengine.sunsource.net/servlets/BrowseList?
>>> list=users&by=thread&from=944"
>>>
>>> which sounds very promising, but that url is not found anymore.
>>>
>>>
>>>   Thanks,
>>>
>>>      John
>>>
>>>
>>> reuti wrote:
>>>
>>>> Am 05.09.2009 um 00:12 schrieb cjf001:
>>>>
>>>>
>>>>
>>>>> Guys, I've got a problem on SGE v6.2u2 that just showed up
>>>>> yesterday,
>>>>> as far as I can tell - I'm getting the following in the qmaster's
>>>>> messages file:
>>>>>
>>>>> 09/04/2009 17:05:24|schedu|lxadml2|W|Jobs 12873 & 12873 dispatched
>>>>> to master/subordinated queues
>>>>> "primary at lxdel20.srl.css.mot.com"/"secondary at lxdel20.srl.css.mot.c 
>>>>> om
>>>>> ".
>>>>> Suspend on subordinate to occur in same scheduling
>>>>> interval. Policy conflict!
>>>>>
>>>>> ... this repeats with a few more jobs, and then ...
>>>>>
>>>>> 09/04/2009 17:05:24|worker|lxadml2|W|Skipping remaining 7 orders
>>>>
>>>>
>>>> Is it a parallel job which might get slots from both queues - the
>>>> superordinated and subordinated - at the same time?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>
>>>>> note that the two jobs mentioned are the same job number, but this
>>>>> is not
>>>>> always the case.
>>>>>
>>>>>
>>>>> Any idea what's going on here ?  The problem this is causing is  
>>>>> that
>>>>> all the jobs are not getting assigned to a queue, even though  
>>>>> there
>>>>> are open resources. Also, the qstat listing shows many jobs  
>>>>> with "0"
>>>>> priority, apparently because they are being "skipped" and have  
>>>>> never
>>>>> been viewed yet by the scheduler.
>>>>>
>>>>> Any help from the programmers greatly appreciated ! I'll do some
>>>>> searching
>>>>> on "policy conflict" in the meantime.....
>>>>>
>>>>>     John
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> ################################################################## 
>>>>> ##
>>>>> ##
>>>>> #####
>>>>> # John Foley                          # Location:  IL93-
>>>>> E1-21S            #
>>>>> # IT & Systems Administration         # Maildrop:  IL93-
>>>>> E1-35O            #
>>>>> # Antenna & Mechanical Simulation Grp #    Email:
>>>>> john.foley at motorola.com #
>>>>> # Motorola, Inc. -  Mobile Devices    #    Phone: (847)
>>>>> 523-8719          #
>>>>> # 600 North US Highway 45             #      Fax: (847)
>>>>> 523-5767          #
>>>>> # Libertyville, IL. 60048  (USA)      #     Cell: (847)
>>>>> 460-8719          #
>>>>> ################################################################## 
>>>>> ##
>>>>> ##
>>>>> #####
>>>>>                (this email sent using Mozilla on Windows)
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=215831
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=215832
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>
>>>
>>>
>>> -- 
>>> #################################################################### 
>>> ##
>>> #####
>>> # John Foley                          # Location:  IL93-
>>> E1-21S            #
>>> # IT & Systems Administration         # Maildrop:  IL93-
>>> E1-35O            #
>>> # Antenna & Mechanical Simulation Grp #    Email:
>>> john.foley at motorola.com #
>>> # Motorola, Inc. -  Mobile Devices    #    Phone: (847)
>>> 523-8719          #
>>> # 600 North US Highway 45             #      Fax: (847)
>>> 523-5767          #
>>> # Libertyville, IL. 60048  (USA)      #     Cell: (847)
>>> 460-8719          #
>>> #################################################################### 
>>> ##
>>> #####
>>>                 (this email sent using Mozilla on Windows)
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=215836
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=215840
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>
>
>
> -- 
> ###################################################################### 
> #####
> # John Foley                          # Location:  IL93- 
> E1-21S            #
> # IT & Systems Administration         # Maildrop:  IL93- 
> E1-35O            #
> # Antenna & Mechanical Simulation Grp #    Email:  
> john.foley at motorola.com #
> # Motorola, Inc. -  Mobile Devices    #    Phone: (847)  
> 523-8719          #
> # 600 North US Highway 45             #      Fax: (847)  
> 523-5767          #
> # Libertyville, IL. 60048  (USA)      #     Cell: (847)  
> 460-8719          #
> ###################################################################### 
> #####
>                  (this email sent using Mozilla on Windows)
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=215843
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=215845

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list