[GE users] Wildcarded PE Name Circumvents Queue Sorting

rems0 Richard.Ems at cape-horn-eng.com
Wed Dec 2 09:53:44 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Daniel,

it would be *** REALLY GREAT *** to have this issue fixed soon!

Is u5 coming out soon?
I only managed to upgrade to u4 some weeks ago! Is there a kind of
roadmap for GE releases available online?

Thanks, Richard

On 12/01/2009 08:50 PM, templedf wrote:
> Richard,
> 
> The issue did not make it into the u5 release.  We'll see what we can do 
> for u6.  Sorry.
> 
> Daniel
> 
> rems0 wrote:
>> Hi list,
>>
>> is this bug (#3021) going to be fixed on 6.2u5?
>> How are issues/bug priorities for fixing given?
>> Does voting for an issue really help?   ;-)
>>
>> Please vote for issue 3021 !!!
>> http://gridengine.sunsource.net/issues/showvotes.cgi?voteon=3021
>>
>> Can someone talk/write/ask an SGE developer?
>> Or any developer answering?
>>
>>
>> Many thanks, Richard
>>
>>
>> On 10/21/2009 03:37 AM, cjf001 wrote:
>>   
>>> Hmm - I've seen something similar, I think - I submitted a bug report
>>> back in May on this:
>>>
>>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=3021
>>>
>>> I was also using wildcards, but not in the pe name - I was using
>>> them in the queue name - so, not sure if it's the same issue
>>> or not. This was never resolved, so I had to do a workaround
>>> which consisted of using hostgroups in place of the wildcards.
>>>
>>> Probably doesn't help you much, but I'd certainly vote for a
>>> rewrite of the "sge_select_parallel_environment" code !
>>>
>>>       John
>>>
>>> templedf wrote:
>>>     
>>>> I have another odd issue.  I have a test config with three queues, 
>>>> test1, test2, and test3.  I also have three PEs, test_1, test_2, and 
>>>> test_3.  Each queue has the corresponding PE in its pe_list, e.g. queue 
>>>> test1 has PE test_1, etc.  I have the queue_sort_method set to "seqno", 
>>>> and each queue has a seq_no equal to its name, e.g. queue test1 has a 
>>>> seq_no of 1.  There is a single host in the cluster, and each queue has 
>>>> 4 slots on that host.  The load_thresholds are set to NONE for all three 
>>>> queues, and there are no other queues in the system.
>>>>
>>>> If I submit:
>>>>
>>>> qsub -t 1-2 sleeper.sh
>>>> qsub -t 1-2 sleeper.sh
>>>> qsub -t 1-2 sleeper.sh
>>>>
>>>> the behavior is as expected.  The first two jobs go to test1, and the 
>>>> third goes to test2.  If, however, I submit:
>>>>
>>>> qsub -pe test_\* 2 sleeper.sh
>>>> qsub -pe test_\* 2 sleeper.sh
>>>> qsub -pe test_\* 2 sleeper.sh
>>>>
>>>> the behavior is undefined.  The jobs may land on any queues in any 
>>>> order.  Looking at the schedd_runlog file, it looks like the wildcarded 
>>>> PE is being used for the sort order of the queues, and the wildcard 
>>>> statement doesn't always create the list of PEs in the same order.  For 
>>>> example:
>>>>
>>>> Tue Oct 20 19:42:53 2009|-------------START-SCHEDULER-RUN-------------
>>>> Tue Oct 20 19:42:53 2009|queue instance 
>>>> "all.q at daniel-templetons-macbook-pro" dropped because it is disabled
>>>> Tue Oct 20 19:42:53 2009|queues dropped because they are disabled: 
>>>> all.q at daniel-templetons-macbook-pro
>>>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test1" because PE 
>>>> "test_2" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test3" because PE 
>>>> "test_2" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test2" because PE 
>>>> "test_1" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test3" because PE 
>>>> "test_1" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test1" because PE 
>>>> "test_3" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test2" because PE 
>>>> "test_3" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test1" because PE 
>>>> "test_2" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test3" because PE 
>>>> "test_2" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test2" because PE 
>>>> "test_1" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test3" because PE 
>>>> "test_1" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test1" because PE 
>>>> "test_3" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test2" because PE 
>>>> "test_3" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|queue instance 
>>>> "test2 at daniel-templetons-macbook-pro" dropped because it is full
>>>> Tue Oct 20 19:42:53 2009|queues dropped because they are full: 
>>>> test2 at daniel-templetons-macbook-pro
>>>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in queue "test1" because PE 
>>>> "test_2" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in queue "test3" because PE 
>>>> "test_2" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in PE "test_2" because it 
>>>> only offers 0 slots
>>>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in queue "test3" because PE 
>>>> "test_1" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in queue "test1" because PE 
>>>> "test_3" is not in pe list
>>>> Tue Oct 20 19:42:53 2009|--------------STOP-SCHEDULER-RUN-------------
>>>>
>>>> In this case, the first two jobs went to test2, and the third went to test1.
>>>>
>>>> Anyone else seen this before?
>>>>
>>>> Aside from reporting the issue, I also need to find a way to get this 
>>>> working.  What I'm trying to do is have three queues that all offer the 
>>>> same PE that uses $fill_up behavior.  The queues should be loaded in 
>>>> seq_no order, and no job should be allowed to span multiple queues.  
>>>> Jobs must either fit entirely into a single queue, or they can't be 
>>>> scheduled.  It's fine if a job spills across multiple hosts within a 
>>>> queue, though.
>>>>
>>>> Any clever ideas on how to get the desired behavior?
>>>>
>>>> Thanks,
>>>> Daniel
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=222465
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>       
>>>     
>>
>>
>>
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=230782
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 


-- 
Richard Ems       mail: Richard.Ems at Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5? piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=230918

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list