[GE users] SMP trouble

Reuti reuti at staff.uni-marburg.de
Mon Sep 1 16:57:19 BST 2008


Hi,

Am 01.09.2008 um 17:51 schrieb Luca Tola:

> Reuti ha scritto:
>> Hi,
>>
>> Am 01.09.2008 um 16:37 schrieb Luca Tola:
>>
>>> Hi all,
>>> I'm trying to run parallel SMP job on my same 4-CPU execution host.
>>> I've created my PE with qconf -ap smp4 :
>>> pe_name smp4
>>> slots 4
>>
>> this is the number of slots for this PE across all hosts. Most  
>> often you will set this to a higher value. The limit per node is  
>> in the queue definition, which you set correctly to 4 there.
>>
>> How did you start your parallel application - you checked the load  
>> with "top" or "uptime"? Is your parallel app 100% parallel?
>>
> With "top" command I check the load of CPUs in the execution host.
>
> What do you mean parallel? In traditional way, when I run any  
> script in the cluster the load splits for 4 in every CPU. In this  
> way every CPU work only for 25%.

with a real parallel job, every CPU should be used by 100% and the  
load of the machine should be at around 4.00.

> With SGE 3 CPUs don't work, but 1 works for 100%.

This is up to the scheduler in Linux, to put the process on the core  
with the best possible cache hits or using other criterias. Whether  
it's just one core or several ones, shouldn't effect the wallclock  
runtime of the job.

-- Reuti


> I'd really like to be able to use all of my CPUs instead of one of  
> four.
>
>
>> Only in case of Open MPI some automatizm is included, to detect  
>> the machinefile and number of cores on its own. With other  
>> parallel libraries, you have to tell your programm the number of  
>> forks or threads.
>>
>
> I did. In my run_eng.sh I've written:
> ...
> ...
> export OMP_NUM_THREADS = 4
> ...
> ...
>
>> What parallel lib are you using?
>>
>
> I don't know. How can I check?
>
> Thanks for all
>
>> -- Reuti
>>
>>
>>> user_lists NONE
>>> xuser_lists NONE
>>> start_proc_args NONE
>>> stop_proc_args NONE
>>> allocation_rule $pe_slots
>>> control_slaves FALSE
>>> job_is_first_task TRUE
>>> urgency_slots max
>>>
>>> My queue configuration is:
>>> qname calcolo
>>> hostlist cluster
>>> seq_no 0
>>> load_thresholds np_load_avg=1.75
>>> suspend_thresholds NONE
>>> nsuspend 1
>>> suspend_interval 00:05:00
>>> priority 0
>>> min_cpu_interval 00:05:00
>>> processors 4
>>> qtype BATCH INTERACTIVE
>>> ckpt_list NONE
>>> pe_list make smp4
>>> rerun FALSE
>>> slots 4
>>> tmpdir /tmp
>>> shell /bin/csh
>>> prolog NONE
>>> epilog NONE
>>> ...
>>> ...
>>> ...
>>> default
>>> ...
>>> ...
>>>
>>> When I try to submit with:
>>> qsub -q calcolo -pe smp4 4 run_eng.sh
>>> the job runs but on only one CPU instead of all CPU in my  
>>> execution host.
>>>
>>> qstat -f
>>>
>>> queuename qtype used/tot. load_avg arch states
>>> -------------------------------------------------------------------- 
>>> --------
>>> all.q at PING6 BIP 0/1 -NA- lx24-x86 au
>>> -------------------------------------------------------------------- 
>>> --------
>>> all.q at cluster BIP 0/4 0.03 lx24-amd64  
>>> -------------------------------------------------------------------- 
>>> --------
>>> all.q at sge-glexec1 BIP 0/1 0.00 lx24-x86  
>>> -------------------------------------------------------------------- 
>>> --------
>>> calcolo at cluster BIP 0/4 0.03 lx24-amd64  
>>> -------------------------------------------------------------------- 
>>> --------
>>> post_processing at sge-glexec1 BIP 0/1 0.00 lx24-x86
>>>
>>> Can somebody tell me where I'making an error?
>>>
>>> Thanks
>>> Luca
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list