[GE users] SMP trouble

Luca Tola luca.tola at phitecingegneria.it
Mon Sep 1 17:09:50 BST 2008


Reuti ha scritto:
> Hi,
>
> Am 01.09.2008 um 17:51 schrieb Luca Tola:
>
>> Reuti ha scritto:
>>> Hi,
>>>
>>> Am 01.09.2008 um 16:37 schrieb Luca Tola:
>>>
>>>> Hi all,
>>>> I'm trying to run parallel SMP job on my same 4-CPU execution host.
>>>> I've created my PE with qconf -ap smp4 :
>>>> pe_name smp4
>>>> slots 4
>>>
>>> this is the number of slots for this PE across all hosts. Most often 
>>> you will set this to a higher value. The limit per node is in the 
>>> queue definition, which you set correctly to 4 there.
>>>
>>> How did you start your parallel application - you checked the load 
>>> with "top" or "uptime"? Is your parallel app 100% parallel?
>>>
>> With "top" command I check the load of CPUs in the execution host.
>>
>> What do you mean parallel? In traditional way, when I run any script 
>> in the cluster the load splits for 4 in every CPU. In this way every 
>> CPU work only for 25%.
>
> with a real parallel job, every CPU should be used by 100% and the 
> load of the machine should be at around 4.00.
>

Sorry, You're right.
When I run any scripts in traditional way every CPUs is used by 100%.

>> With SGE 3 CPUs don't work, but 1 works for 100%.
>
> This is up to the scheduler in Linux, to put the process on the core 
> with the best possible cache hits or using other criterias. Whether 
> it's just one core or several ones, shouldn't effect the wallclock 
> runtime of the job.
>
> -- Reuti
>
>
>> I'd really like to be able to use all of my CPUs instead of one of four.
>>
>>
>>> Only in case of Open MPI some automatizm is included, to detect the 
>>> machinefile and number of cores on its own. With other parallel 
>>> libraries, you have to tell your programm the number of forks or 
>>> threads.
>>>
>>
>> I did. In my run_eng.sh I've written:
>> ...
>> ...
>> export OMP_NUM_THREADS = 4
>> ...
>> ...
>>
>>> What parallel lib are you using?
>>>
>>
>> I don't know. How can I check?
>>
>> Thanks for all
>>
>>> -- Reuti
>>>
>>>
>>>> user_lists NONE
>>>> xuser_lists NONE
>>>> start_proc_args NONE
>>>> stop_proc_args NONE
>>>> allocation_rule $pe_slots
>>>> control_slaves FALSE
>>>> job_is_first_task TRUE
>>>> urgency_slots max
>>>>
>>>> My queue configuration is:
>>>> qname calcolo
>>>> hostlist cluster
>>>> seq_no 0
>>>> load_thresholds np_load_avg=1.75
>>>> suspend_thresholds NONE
>>>> nsuspend 1
>>>> suspend_interval 00:05:00
>>>> priority 0
>>>> min_cpu_interval 00:05:00
>>>> processors 4
>>>> qtype BATCH INTERACTIVE
>>>> ckpt_list NONE
>>>> pe_list make smp4
>>>> rerun FALSE
>>>> slots 4
>>>> tmpdir /tmp
>>>> shell /bin/csh
>>>> prolog NONE
>>>> epilog NONE
>>>> ...
>>>> ...
>>>> ...
>>>> default
>>>> ...
>>>> ...
>>>>
>>>> When I try to submit with:
>>>> qsub -q calcolo -pe smp4 4 run_eng.sh
>>>> the job runs but on only one CPU instead of all CPU in my execution 
>>>> host.
>>>>
>>>> qstat -f
>>>>
>>>> queuename qtype used/tot. load_avg arch states
>>>> ---------------------------------------------------------------------------- 
>>>>
>>>> all.q at PING6 BIP 0/1 -NA- lx24-x86 au
>>>> ---------------------------------------------------------------------------- 
>>>>
>>>> all.q at cluster BIP 0/4 0.03 lx24-amd64 
>>>> ---------------------------------------------------------------------------- 
>>>>
>>>> all.q at sge-glexec1 BIP 0/1 0.00 lx24-x86 
>>>> ---------------------------------------------------------------------------- 
>>>>
>>>> calcolo at cluster BIP 0/4 0.03 lx24-amd64 
>>>> ---------------------------------------------------------------------------- 
>>>>
>>>> post_processing at sge-glexec1 BIP 0/1 0.00 lx24-x86
>>>>
>>>> Can somebody tell me where I'making an error?
>>>>
>>>> Thanks
>>>> Luca
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list