[GE users] SMP trouble

Luca Tola luca.tola at phitecingegneria.it
Mon Sep 1 16:51:01 BST 2008


Reuti ha scritto:
> Hi,
>
> Am 01.09.2008 um 16:37 schrieb Luca Tola:
>
>> Hi all,
>> I'm trying to run parallel SMP job on my same 4-CPU execution host.
>> I've created my PE with qconf -ap smp4 :
>> pe_name smp4
>> slots 4
>
> this is the number of slots for this PE across all hosts. Most often 
> you will set this to a higher value. The limit per node is in the 
> queue definition, which you set correctly to 4 there.
>
> How did you start your parallel application - you checked the load 
> with "top" or "uptime"? Is your parallel app 100% parallel?
>
With "top" command I check the load of CPUs in the execution host.

What do you mean parallel? In traditional way, when I run any script in 
the cluster the load splits for 4 in every CPU. In this way every CPU 
work only for 25%.
With SGE 3 CPUs don't work, but 1 works for 100%.
I'd really like to be able to use all of my CPUs instead of one of four.


> Only in case of Open MPI some automatizm is included, to detect the 
> machinefile and number of cores on its own. With other parallel 
> libraries, you have to tell your programm the number of forks or threads.
>

I did. In my run_eng.sh I've written:
...
...
export OMP_NUM_THREADS = 4
...
...

> What parallel lib are you using?
>

I don't know. How can I check?

Thanks for all

> -- Reuti
>
>
>> user_lists NONE
>> xuser_lists NONE
>> start_proc_args NONE
>> stop_proc_args NONE
>> allocation_rule $pe_slots
>> control_slaves FALSE
>> job_is_first_task TRUE
>> urgency_slots max
>>
>> My queue configuration is:
>> qname calcolo
>> hostlist cluster
>> seq_no 0
>> load_thresholds np_load_avg=1.75
>> suspend_thresholds NONE
>> nsuspend 1
>> suspend_interval 00:05:00
>> priority 0
>> min_cpu_interval 00:05:00
>> processors 4
>> qtype BATCH INTERACTIVE
>> ckpt_list NONE
>> pe_list make smp4
>> rerun FALSE
>> slots 4
>> tmpdir /tmp
>> shell /bin/csh
>> prolog NONE
>> epilog NONE
>> ...
>> ...
>> ...
>> default
>> ...
>> ...
>>
>> When I try to submit with:
>> qsub -q calcolo -pe smp4 4 run_eng.sh
>> the job runs but on only one CPU instead of all CPU in my execution 
>> host.
>>
>> qstat -f
>>
>> queuename qtype used/tot. load_avg arch states
>> ---------------------------------------------------------------------------- 
>>
>> all.q at PING6 BIP 0/1 -NA- lx24-x86 au
>> ---------------------------------------------------------------------------- 
>>
>> all.q at cluster BIP 0/4 0.03 lx24-amd64 
>> ---------------------------------------------------------------------------- 
>>
>> all.q at sge-glexec1 BIP 0/1 0.00 lx24-x86 
>> ---------------------------------------------------------------------------- 
>>
>> calcolo at cluster BIP 0/4 0.03 lx24-amd64 
>> ---------------------------------------------------------------------------- 
>>
>> post_processing at sge-glexec1 BIP 0/1 0.00 lx24-x86
>>
>> Can somebody tell me where I'making an error?
>>
>> Thanks
>> Luca
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list