[GE users] SMP trouble

Reuti reuti at staff.uni-marburg.de
Mon Sep 1 17:12:07 BST 2008


Am 01.09.2008 um 18:09 schrieb Luca Tola:

> Reuti ha scritto:
>> Hi,
>>
>> Am 01.09.2008 um 17:51 schrieb Luca Tola:
>>
>>> Reuti ha scritto:
>>>> Hi,
>>>>
>>>> Am 01.09.2008 um 16:37 schrieb Luca Tola:
>>>>
>>>>> Hi all,
>>>>> I'm trying to run parallel SMP job on my same 4-CPU execution  
>>>>> host.
>>>>> I've created my PE with qconf -ap smp4 :
>>>>> pe_name smp4
>>>>> slots 4
>>>>
>>>> this is the number of slots for this PE across all hosts. Most  
>>>> often you will set this to a higher value. The limit per node is  
>>>> in the queue definition, which you set correctly to 4 there.
>>>>
>>>> How did you start your parallel application - you checked the  
>>>> load with "top" or "uptime"? Is your parallel app 100% parallel?
>>>>
>>> With "top" command I check the load of CPUs in the execution host.
>>>
>>> What do you mean parallel? In traditional way, when I run any  
>>> script in the cluster the load splits for 4 in every CPU. In this  
>>> way every CPU work only for 25%.
>>
>> with a real parallel job, every CPU should be used by 100% and the  
>> load of the machine should be at around 4.00.
>>
>
> Sorry, You're right.
> When I run any scripts in traditional way every CPUs is used by 100%.

Aha, what is in the script? Any definition of environmental variables?

-- Reuti


>>> With SGE 3 CPUs don't work, but 1 works for 100%.
>>
>> This is up to the scheduler in Linux, to put the process on the  
>> core with the best possible cache hits or using other criterias.  
>> Whether it's just one core or several ones, shouldn't effect the  
>> wallclock runtime of the job.
>>
>> -- Reuti
>>
>>
>>> I'd really like to be able to use all of my CPUs instead of one  
>>> of four.
>>>
>>>
>>>> Only in case of Open MPI some automatizm is included, to detect  
>>>> the machinefile and number of cores on its own. With other  
>>>> parallel libraries, you have to tell your programm the number of  
>>>> forks or threads.
>>>>
>>>
>>> I did. In my run_eng.sh I've written:
>>> ...
>>> ...
>>> export OMP_NUM_THREADS = 4
>>> ...
>>> ...
>>>
>>>> What parallel lib are you using?
>>>>
>>>
>>> I don't know. How can I check?
>>>
>>> Thanks for all
>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> user_lists NONE
>>>>> xuser_lists NONE
>>>>> start_proc_args NONE
>>>>> stop_proc_args NONE
>>>>> allocation_rule $pe_slots
>>>>> control_slaves FALSE
>>>>> job_is_first_task TRUE
>>>>> urgency_slots max
>>>>>
>>>>> My queue configuration is:
>>>>> qname calcolo
>>>>> hostlist cluster
>>>>> seq_no 0
>>>>> load_thresholds np_load_avg=1.75
>>>>> suspend_thresholds NONE
>>>>> nsuspend 1
>>>>> suspend_interval 00:05:00
>>>>> priority 0
>>>>> min_cpu_interval 00:05:00
>>>>> processors 4
>>>>> qtype BATCH INTERACTIVE
>>>>> ckpt_list NONE
>>>>> pe_list make smp4
>>>>> rerun FALSE
>>>>> slots 4
>>>>> tmpdir /tmp
>>>>> shell /bin/csh
>>>>> prolog NONE
>>>>> epilog NONE
>>>>> ...
>>>>> ...
>>>>> ...
>>>>> default
>>>>> ...
>>>>> ...
>>>>>
>>>>> When I try to submit with:
>>>>> qsub -q calcolo -pe smp4 4 run_eng.sh
>>>>> the job runs but on only one CPU instead of all CPU in my  
>>>>> execution host.
>>>>>
>>>>> qstat -f
>>>>>
>>>>> queuename qtype used/tot. load_avg arch states
>>>>> ------------------------------------------------------------------ 
>>>>> ----------
>>>>> all.q at PING6 BIP 0/1 -NA- lx24-x86 au
>>>>> ------------------------------------------------------------------ 
>>>>> ----------
>>>>> all.q at cluster BIP 0/4 0.03 lx24-amd64  
>>>>> ------------------------------------------------------------------ 
>>>>> ----------
>>>>> all.q at sge-glexec1 BIP 0/1 0.00 lx24-x86  
>>>>> ------------------------------------------------------------------ 
>>>>> ----------
>>>>> calcolo at cluster BIP 0/4 0.03 lx24-amd64  
>>>>> ------------------------------------------------------------------ 
>>>>> ----------
>>>>> post_processing at sge-glexec1 BIP 0/1 0.00 lx24-x86
>>>>>
>>>>> Can somebody tell me where I'making an error?
>>>>>
>>>>> Thanks
>>>>> Luca
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list