[GE users] Job fragmentation of sge 6.0u8

Reuti reuti at staff.uni-marburg.de
Tue Jul 29 12:17:00 BST 2008


Am 28.07.2008 um 21:49 schrieb Alessio Comisso:

>
> Il giorno 28/lug/08, alle ore 18:29, Reuti ha scritto:
>
>> Am 28.07.2008 um 18:29 schrieb Alessio Comisso:
>>
>>> Il giorno 28/lug/08, alle ore 17:14, Reuti ha scritto:
>>>
>>> Hello,
>>>> Hi,
>>>>
>>>> Am 28.07.2008 um 17:28 schrieb Alessio Comisso:
>>>>
>>>>> Dear all,
>>>>> This is the first time I am writing to this forum, I hope it is  
>>>>> the right place where to ask for support.
>>>>
>>>> yes - sure :-)
>>>>
>>>>> I have a cluster running mpi jobs, but happens that more jobs  
>>>>> are scheduled in the same node, for instance a qstat -f gives
>>>>>
>>>>> infini.q at node089.beowulf.clust BIP   4/4       4.00     lx26-amd64
>>>>>   10568 0.53560 x88-PTCDA- toton        r     07/27/2008  
>>>>> 08:50:15     2
>>>>>   10632 0.62232 test-PAW   levita       r     07/28/2008  
>>>>> 12:53:39     2
>>>>
>>>> You defined 4 slots on this node, anmd SGE scheduled 4 tasks to it.
>>>>
>>>>>
>>>>> This in-homogeneous usage is not optimal, as the CPU usage is  
>>>>> very low.
>>>>>
>>>>>                                                                    
>>>>>          %CPU
>>>>> 22190 levita    25   0  289m 182m 5772 R 53.3  2.3 108:12.66  
>>>>> pwcapablanca.x
>>>>> 22191 levita    25   0  289m 189m  13m R 51.3  2.4 110:20.80  
>>>>> pwcapablanca.x
>>>>> 12688 toton     25   0  459m 411m  13m R 48.6  5.2   1031:20  
>>>>> siesta
>>>>> 12689 toton     25   0  207m 147m 6072 R 46.6  1.9   1037:39  
>>>>> siesta
>>>>
>>>> So, it looks like the machine has only 2 core, not 4. So what  
>>>> must be adjusted is the number of slots in the queue configuration:
>>>
>>> No no, the machine has 4 cores, I think that the communication  
>>> patterns are limiting the performances. If the jobs are  
>>> homogeneus you get 4 processes 99.9%.
>>
>> On the one hand this would explain why 4 slots are defined for  
>> this node. But all seems to be in best order. You want two cores  
>> idling on this machine? What is your allocation rule in the  
>> requested PE?
>
> Yes, if the job occupation is homogeneous the performances are much  
> better
>
>>
>> One way to get a complete node for a job is to set  
>> "allocation_rule $pe_slots", and request all memory in the machine  
>> by a proper setup of "virtual_free" or "h_vmem" of your choice. As  
>> all memory is allocated already to one job, nothing else will be  
>> scheduled there although slots are free.
>
> Thanks, now the allocation_rule is $fill_up. Do I need to use the  
> virtual_free thing?

If you always request less cores than are installed per system and  
want them on one and the same machine for sure, then the best would  
be to set the allocation rule to $pe_slots.

a) The virtual_free you will need to avoid additonal jobs to be  
scheduled to this machine. Hence make is consumable in the complex  
definition (qconf -mc), set a default request of e.g. 250m  and set a  
proper value of the installed machines in the exec host definition  
(qconf -me <hostname>). Then submit the job with half of the amount  
in "qsub -p 2 -l virtual_free=500m jobscript" for a machine where  
virtual_free=1000m is defined.

or b) request 4 slots and ignore simply 2 slots you got granted. As  
they are all on one and the same machine (because of $pe_slots), you  
know you can use the complete machine

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list