[GE users] SMP trouble

Luca Tola luca.tola at phitecingegneria.it
Mon Sep 1 16:16:48 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Chris Dagdigian ha scritto:
>
> Your configuration looks OK at first glance. The PE looks good, the PE 
> is attached to the calcolo queue and your qstat shows empty queues 
> with one queue down but the others idle and available.
>
> How do you know that your job is not running 4 threads via SMP? 
With "top" command. Later typing "1" I can see the situation of all my CPUs.

> Does it run fine on 4 CPUs outside of Grid Engine on that host?
I don't know. But when I submit the same job in the cluster "without" 
SGE all it's good.
>
> From SGE's qstat view you would expect to see "4/4" in the slots 
> column for the calcolo queue indicating that 4 out of 4 available 
> slots are blocked/consumed by your SGE job. There would really be no 
> other indication from SGE other than that if the job runs and is 
> dispatched OK.
I show my qstat - f command when SGE runs

[sgeadmin at sge jobs]$ qstat -f

queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q at PING6                    BIP   0/1       -NA-     lx24-x86      au
----------------------------------------------------------------------------
all.q at cluster                  BIP   0/4       0.00     lx24-amd64   
----------------------------------------------------------------------------
all.q at sge-glexec1              BIP   0/1       0.03     lx24-x86     
----------------------------------------------------------------------------
calcolo at cluster                BIP   4/4       0.00     lx24-amd64   
    467 0.55500 run_eng sgeadmin     r     09/01/2008 17:03:13     4       
----------------------------------------------------------------------------
post_processing at sge-glexec1    BIP   0/1       0.03     lx24-x86     




> -Chris
>
>
>
> On Sep 1, 2008, at 10:37 AM, Luca Tola wrote:
>
>> Hi all,
>> I'm trying to run parallel SMP job on my same 4-CPU execution host.
>> I've created my PE with qconf  -ap smp4 :
>>
>> pe_name             smp4
>> slots                    4
>> user_lists            NONE
>> xuser_lists          NONE
>> start_proc_args  NONE
>> stop_proc_args  NONE
>> allocation_rule   $pe_slots
>> control_slaves    FALSE
>> job_is_first_task TRUE
>> urgency_slots     max
>>
>> My queue configuration is:
>> qname                        calcolo
>> hostlist                        cluster
>> seq_no                        0
>> load_thresholds          np_load_avg=1.75
>> suspend_thresholds    NONE
>> nsuspend                   1
>> suspend_interval       00:05:00
>> priority                      0
>> min_cpu_interval      00:05:00
>> processors                4
>> qtype                        BATCH INTERACTIVE
>> ckpt_list                   NONE
>> pe_list                      make smp4
>> rerun                       FALSE
>> slots                         4
>> tmpdir                     /tmp
>> shell                        /bin/csh
>> prolog                     NONE
>> epilog                      NONE
>> ...
>> ...
>> ...
>> default
>> ...
>> ...
>>
>> When I try to submit with:
>> qsub -q calcolo -pe smp4 4 run_eng.sh
>> the job runs but on only one CPU instead of all CPU in my execution 
>> host.
>>
>> qstat -f
>>
>> queuename                      qtype used/tot. load_avg arch          
>> states
>> ---------------------------------------------------------------------------- 
>>
>> all.q at PING6                    BIP   0/1       -NA-     lx24-x86      au
>> ---------------------------------------------------------------------------- 
>>
>> all.q at cluster                  BIP   0/4       0.03     lx24-amd64   
>> ---------------------------------------------------------------------------- 
>>
>> all.q at sge-glexec1              BIP   0/1       0.00     lx24-x86     
>> ---------------------------------------------------------------------------- 
>>
>> calcolo at cluster                BIP   0/4       0.03     lx24-amd64   
>> ---------------------------------------------------------------------------- 
>>
>> post_processing at sge-glexec1    BIP   0/1       0.00     lx24-x86
>>
>> Can somebody tell me where I'making an error?
>>
>> Thanks
>> Luca
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list