[GE users] SMP trouble

Reuti reuti at staff.uni-marburg.de
Mon Sep 1 15:45:42 BST 2008


Hi,

Am 01.09.2008 um 16:37 schrieb Luca Tola:

> Hi all,
> I'm trying to run parallel SMP job on my same 4-CPU execution host.
> I've created my PE with qconf  -ap smp4 :
> pe_name             smp4
> slots                    4

this is the number of slots for this PE across all hosts. Most often  
you will set this to a higher value. The limit per node is in the  
queue definition, which you set correctly to 4 there.

How did you start your parallel application - you checked the load  
with "top" or "uptime"? Is your parallel app 100% parallel?

Only in case of Open MPI some automatizm is included, to detect the  
machinefile and number of cores on its own. With other parallel  
libraries, you have to tell your programm the number of forks or  
threads.

What parallel lib are you using?

-- Reuti


> user_lists            NONE
> xuser_lists          NONE
> start_proc_args  NONE
> stop_proc_args  NONE
> allocation_rule   $pe_slots
> control_slaves    FALSE
> job_is_first_task TRUE
> urgency_slots     max
>
> My queue configuration is:
> qname                        calcolo
> hostlist                        cluster
> seq_no                        0
> load_thresholds          np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend                   1
> suspend_interval       00:05:00
> priority                      0
> min_cpu_interval      00:05:00
> processors                4
> qtype                        BATCH INTERACTIVE
> ckpt_list                   NONE
> pe_list                      make smp4
> rerun                       FALSE
> slots                         4
> tmpdir                     /tmp
> shell                        /bin/csh
> prolog                     NONE
> epilog                      NONE
> ...
> ...
> ...
> default
> ...
> ...
>
> When I try to submit with:
> qsub -q calcolo -pe smp4 4 run_eng.sh
> the job runs but on only one CPU instead of all CPU in my execution  
> host.
>
> qstat -f
>
> queuename                      qtype used/tot. load_avg  
> arch          states
> ---------------------------------------------------------------------- 
> ------
> all.q at PING6                    BIP   0/1       -NA-     lx24- 
> x86      au
> ---------------------------------------------------------------------- 
> ------
> all.q at cluster                  BIP   0/4       0.03     lx24- 
> amd64    
> ---------------------------------------------------------------------- 
> ------
> all.q at sge-glexec1              BIP   0/1       0.00     lx24- 
> x86      
> ---------------------------------------------------------------------- 
> ------
> calcolo at cluster                BIP   0/4       0.03     lx24- 
> amd64    
> ---------------------------------------------------------------------- 
> ------
> post_processing at sge-glexec1    BIP   0/1       0.00     lx24-x86
>
> Can somebody tell me where I'making an error?
>
> Thanks
> Luca
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list