[GE users] Wildcarded PE Name Circumvents Queue Sorting

rems0 Richard.Ems at cape-horn-eng.com
Tue Dec 1 19:24:01 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi list,

is this bug (#3021) going to be fixed on 6.2u5?
How are issues/bug priorities for fixing given?
Does voting for an issue really help?   ;-)

Please vote for issue 3021 !!!
http://gridengine.sunsource.net/issues/showvotes.cgi?voteon=3021

Can someone talk/write/ask an SGE developer?
Or any developer answering?


Many thanks, Richard


On 10/21/2009 03:37 AM, cjf001 wrote:
> Hmm - I've seen something similar, I think - I submitted a bug report
> back in May on this:
> 
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=3021
> 
> I was also using wildcards, but not in the pe name - I was using
> them in the queue name - so, not sure if it's the same issue
> or not. This was never resolved, so I had to do a workaround
> which consisted of using hostgroups in place of the wildcards.
> 
> Probably doesn't help you much, but I'd certainly vote for a
> rewrite of the "sge_select_parallel_environment" code !
> 
>       John
> 
> templedf wrote:
>> I have another odd issue.  I have a test config with three queues, 
>> test1, test2, and test3.  I also have three PEs, test_1, test_2, and 
>> test_3.  Each queue has the corresponding PE in its pe_list, e.g. queue 
>> test1 has PE test_1, etc.  I have the queue_sort_method set to "seqno", 
>> and each queue has a seq_no equal to its name, e.g. queue test1 has a 
>> seq_no of 1.  There is a single host in the cluster, and each queue has 
>> 4 slots on that host.  The load_thresholds are set to NONE for all three 
>> queues, and there are no other queues in the system.
>>
>> If I submit:
>>
>> qsub -t 1-2 sleeper.sh
>> qsub -t 1-2 sleeper.sh
>> qsub -t 1-2 sleeper.sh
>>
>> the behavior is as expected.  The first two jobs go to test1, and the 
>> third goes to test2.  If, however, I submit:
>>
>> qsub -pe test_\* 2 sleeper.sh
>> qsub -pe test_\* 2 sleeper.sh
>> qsub -pe test_\* 2 sleeper.sh
>>
>> the behavior is undefined.  The jobs may land on any queues in any 
>> order.  Looking at the schedd_runlog file, it looks like the wildcarded 
>> PE is being used for the sort order of the queues, and the wildcard 
>> statement doesn't always create the list of PEs in the same order.  For 
>> example:
>>
>> Tue Oct 20 19:42:53 2009|-------------START-SCHEDULER-RUN-------------
>> Tue Oct 20 19:42:53 2009|queue instance 
>> "all.q at daniel-templetons-macbook-pro" dropped because it is disabled
>> Tue Oct 20 19:42:53 2009|queues dropped because they are disabled: 
>> all.q at daniel-templetons-macbook-pro
>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test1" because PE 
>> "test_2" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test3" because PE 
>> "test_2" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test2" because PE 
>> "test_1" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test3" because PE 
>> "test_1" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test1" because PE 
>> "test_3" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 54 cannot run in queue "test2" because PE 
>> "test_3" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test1" because PE 
>> "test_2" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test3" because PE 
>> "test_2" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test2" because PE 
>> "test_1" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test3" because PE 
>> "test_1" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test1" because PE 
>> "test_3" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 55 cannot run in queue "test2" because PE 
>> "test_3" is not in pe list
>> Tue Oct 20 19:42:53 2009|queue instance 
>> "test2 at daniel-templetons-macbook-pro" dropped because it is full
>> Tue Oct 20 19:42:53 2009|queues dropped because they are full: 
>> test2 at daniel-templetons-macbook-pro
>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in queue "test1" because PE 
>> "test_2" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in queue "test3" because PE 
>> "test_2" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in PE "test_2" because it 
>> only offers 0 slots
>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in queue "test3" because PE 
>> "test_1" is not in pe list
>> Tue Oct 20 19:42:53 2009|Job 56 cannot run in queue "test1" because PE 
>> "test_3" is not in pe list
>> Tue Oct 20 19:42:53 2009|--------------STOP-SCHEDULER-RUN-------------
>>
>> In this case, the first two jobs went to test2, and the third went to test1.
>>
>> Anyone else seen this before?
>>
>> Aside from reporting the issue, I also need to find a way to get this 
>> working.  What I'm trying to do is have three queues that all offer the 
>> same PE that uses $fill_up behavior.  The queues should be loaded in 
>> seq_no order, and no job should be allowed to span multiple queues.  
>> Jobs must either fit entirely into a single queue, or they can't be 
>> scheduled.  It's fine if a job spills across multiple hosts within a 
>> queue, though.
>>
>> Any clever ideas on how to get the desired behavior?
>>
>> Thanks,
>> Daniel
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=222465
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
> 


-- 
Richard Ems       mail: Richard.Ems at Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5? piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924
http://www.cape-horn-eng.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=230779

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list