[GE users] SMP trouble

Luca Tola luca.tola at phitecingegneria.it
Tue Sep 2 14:00:04 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti ha scritto:
> Am 01.09.2008 um 18:09 schrieb Luca Tola:
>
>> Reuti ha scritto:
>>> Hi,
>>>
>>> Am 01.09.2008 um 17:51 schrieb Luca Tola:
>>>
>>>> Reuti ha scritto:
>>>>> Hi,
>>>>>
>>>>> Am 01.09.2008 um 16:37 schrieb Luca Tola:
>>>>>
>>>>>> Hi all,
>>>>>> I'm trying to run parallel SMP job on my same 4-CPU execution host.
>>>>>> I've created my PE with qconf -ap smp4 :
>>>>>> pe_name smp4
>>>>>> slots 4
>>>>>
>>>>> this is the number of slots for this PE across all hosts. Most 
>>>>> often you will set this to a higher value. The limit per node is 
>>>>> in the queue definition, which you set correctly to 4 there.
>>>>>
>>>>> How did you start your parallel application - you checked the load 
>>>>> with "top" or "uptime"? Is your parallel app 100% parallel?
>>>>>
>>>> With "top" command I check the load of CPUs in the execution host.
>>>>
>>>> What do you mean parallel? In traditional way, when I run any 
>>>> script in the cluster the load splits for 4 in every CPU. In this 
>>>> way every CPU work only for 25%.
>>>
>>> with a real parallel job, every CPU should be used by 100% and the 
>>> load of the machine should be at around 4.00.
>>>
>>
>> Sorry, You're right.
>> When I run any scripts in traditional way every CPUs is used by 100%.
>
> Aha, what is in the script? Any definition of environmental variables?
>
> -- Reuti
>
>

I fixed it. My run_eng.sh script was wrong.

Thank you very much

>>>> With SGE 3 CPUs don't work, but 1 works for 100%.
>>>
>>> This is up to the scheduler in Linux, to put the process on the core 
>>> with the best possible cache hits or using other criterias. Whether 
>>> it's just one core or several ones, shouldn't effect the wallclock 
>>> runtime of the job.
>>>
>>> -- Reuti
>>>
>>>
>>>> I'd really like to be able to use all of my CPUs instead of one of 
>>>> four.
>>>>
>>>>
>>>>> Only in case of Open MPI some automatizm is included, to detect 
>>>>> the machinefile and number of cores on its own. With other 
>>>>> parallel libraries, you have to tell your programm the number of 
>>>>> forks or threads.
>>>>>
>>>>
>>>> I did. In my run_eng.sh I've written:
>>>> ...
>>>> ...
>>>> export OMP_NUM_THREADS = 4
>>>> ...
>>>> ...
>>>>
>>>>> What parallel lib are you using?
>>>>>
>>>>
>>>> I don't know. How can I check?
>>>>
>>>> Thanks for all
>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>> user_lists NONE
>>>>>> xuser_lists NONE
>>>>>> start_proc_args NONE
>>>>>> stop_proc_args NONE
>>>>>> allocation_rule $pe_slots
>>>>>> control_slaves FALSE
>>>>>> job_is_first_task TRUE
>>>>>> urgency_slots max
>>>>>>
>>>>>> My queue configuration is:
>>>>>> qname calcolo
>>>>>> hostlist cluster
>>>>>> seq_no 0
>>>>>> load_thresholds np_load_avg=1.75
>>>>>> suspend_thresholds NONE
>>>>>> nsuspend 1
>>>>>> suspend_interval 00:05:00
>>>>>> priority 0
>>>>>> min_cpu_interval 00:05:00
>>>>>> processors 4
>>>>>> qtype BATCH INTERACTIVE
>>>>>> ckpt_list NONE
>>>>>> pe_list make smp4
>>>>>> rerun FALSE
>>>>>> slots 4
>>>>>> tmpdir /tmp
>>>>>> shell /bin/csh
>>>>>> prolog NONE
>>>>>> epilog NONE
>>>>>> ...
>>>>>> ...
>>>>>> ...
>>>>>> default
>>>>>> ...
>>>>>> ...
>>>>>>
>>>>>> When I try to submit with:
>>>>>> qsub -q calcolo -pe smp4 4 run_eng.sh
>>>>>> the job runs but on only one CPU instead of all CPU in my 
>>>>>> execution host.
>>>>>>
>>>>>> qstat -f
>>>>>>
>>>>>> queuename qtype used/tot. load_avg arch states
>>>>>> ---------------------------------------------------------------------------- 
>>>>>>
>>>>>> all.q at PING6 BIP 0/1 -NA- lx24-x86 au
>>>>>> ---------------------------------------------------------------------------- 
>>>>>>
>>>>>> all.q at cluster BIP 0/4 0.03 lx24-amd64 
>>>>>> ---------------------------------------------------------------------------- 
>>>>>>
>>>>>> all.q at sge-glexec1 BIP 0/1 0.00 lx24-x86 
>>>>>> ---------------------------------------------------------------------------- 
>>>>>>
>>>>>> calcolo at cluster BIP 0/4 0.03 lx24-amd64 
>>>>>> ---------------------------------------------------------------------------- 
>>>>>>
>>>>>> post_processing at sge-glexec1 BIP 0/1 0.00 lx24-x86
>>>>>>
>>>>>> Can somebody tell me where I'making an error?
>>>>>>
>>>>>> Thanks
>>>>>> Luca
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list