[GE users] SMP trouble

Chris Dagdigian dag at sonsorol.org
Mon Sep 1 15:43:42 BST 2008


Your configuration looks OK at first glance. The PE looks good, the PE  
is attached to the calcolo queue and your qstat shows empty queues  
with one queue down but the others idle and available.

How do you know that your job is not running 4 threads via SMP? Does  
it run fine on 4 CPUs outside of Grid Engine on that host?

 From SGE's qstat view you would expect to see "4/4" in the slots  
column for the calcolo queue indicating that 4 out of 4 available  
slots are blocked/consumed by your SGE job. There would really be no  
other indication from SGE other than that if the job runs and is  
dispatched OK.


-Chris



On Sep 1, 2008, at 10:37 AM, Luca Tola wrote:

> Hi all,
> I'm trying to run parallel SMP job on my same 4-CPU execution host.
> I've created my PE with qconf  -ap smp4 :
>
> pe_name             smp4
> slots                    4
> user_lists            NONE
> xuser_lists          NONE
> start_proc_args  NONE
> stop_proc_args  NONE
> allocation_rule   $pe_slots
> control_slaves    FALSE
> job_is_first_task TRUE
> urgency_slots     max
>
> My queue configuration is:
> qname                        calcolo
> hostlist                        cluster
> seq_no                        0
> load_thresholds          np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend                   1
> suspend_interval       00:05:00
> priority                      0
> min_cpu_interval      00:05:00
> processors                4
> qtype                        BATCH INTERACTIVE
> ckpt_list                   NONE
> pe_list                      make smp4
> rerun                       FALSE
> slots                         4
> tmpdir                     /tmp
> shell                        /bin/csh
> prolog                     NONE
> epilog                      NONE
> ...
> ...
> ...
> default
> ...
> ...
>
> When I try to submit with:
> qsub -q calcolo -pe smp4 4 run_eng.sh
> the job runs but on only one CPU instead of all CPU in my execution  
> host.
>
> qstat -f
>
> queuename                      qtype used/tot. load_avg  
> arch          states
> ----------------------------------------------------------------------------
> all.q at PING6                    BIP   0/1       -NA-     lx24- 
> x86      au
> ----------------------------------------------------------------------------
> all.q at cluster                  BIP   0/4       0.03     lx24-amd64    
> ----------------------------------------------------------------------------
> all.q at sge-glexec1              BIP   0/1       0.00     lx24-x86      
> ----------------------------------------------------------------------------
> calcolo at cluster                BIP   0/4       0.03     lx24-amd64    
> ----------------------------------------------------------------------------
> post_processing at sge-glexec1    BIP   0/1       0.00     lx24-x86
>
> Can somebody tell me where I'making an error?
>
> Thanks
> Luca
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list