[GE users] OpenMP issue with SGE5.3

reuti reuti at staff.uni-marburg.de
Thu May 27 11:25:49 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Am 27.05.2010 um 11:43 schrieb dagru:

> Charpy, 
> 
> yes, MPI would do it. The difference when using -pe make 1 instead of 2 is 
> that you would allow oversubscription of an host because you are actually using 
> 2 processors instead of the 1 requested. If an overall slot limit of 2 per host is configured
> on your system, two of your programs could be scheduled by SGE to the same 
> host, which is not really what you want.

Yep. To say it in another way: any parallel application should be submitted to SGE as a parallel one.

a) you can use $NSLOTS to set OMP_NUM_THREADS in your jobscript, and so adjusting the number of used slots from run to run just on the command line w/o altering the script. This will also avoid, that OpenMP will just use the default, which is the number of cores it sees in the system (mostly useful of course when you have even more cores in a machine and many users at the same time on a machine).

b) the parallel job will reflect in `qstat` by the listed slot count.

c) accouting will be correct, as you reserve two slots.

-- Reuti


> Daniel
> 
> 
> On 05/27/10 10:58, charpy wrote:
>> Hello Daniel, and thank you for your answer,
>> 
>> If I undersand well my program won't be able to run over different nodes using only OpenMP. Well what a shame it's not so easy ^^
>> I guess to do so I need MPI don't I? Concerning the PE configuration, isn'it possible just to write "-pe make 1" instead of "-pe make 2" in my bash file, which is equivalent to configure my PE for runnig my program on a single node?
>> 
>> Charpy
>> 
>>   
>> 
>>> Hi Charpy,
>>> 
>>> OpenMP makes your program multithreaded, which means it can only
>>> exploit multi-core, multi-processor nodes but can not be distributed over
>>> different nodes. Hence using SGE PEs also does not transform your program
>>> somehow. What you need to do is that you modify your PE in order
>>> to allocate only slots on one single host which your program can use.
>>> In order to do so you must adapt your "allocation_rule" to $pe_slots.
>>> 
>>> Daniel
>>> 
>>> On 05/27/10 09:04, charpy wrote:
>>>     
>>> 
>>>> Hello everyone,
>>>> 
>>>> I am a PhD student and currently working on a very time and resource consuming program. Because of these issues I parallelised the code using OpenMP procedures, and tried to run it in one of the lab?s cluster, but it?s not working very well? The cluster is running under SGE 5.3, has 8 nodes plus the master one, each node having 2 processors. My problem is that I can only use the processors of one node at a time, even when using qsub.
>>>> 
>>>> Here is an example, I wrote the simple following fortran code :
>>>> 
>>>>       PROGRAM main
>>>> 
>>>> !$    use omp_lib
>>>>       IMPLICIT none
>>>> 
>>>> !$    CALL OMP_SET_DYNAMIC(.true.)
>>>> !$    CALL OMP_SET_NESTED(.true.)
>>>> 
>>>>       write(*,*) 'OpenMP test'
>>>> !$    write(*,'(a16,i2)') ' - Processors : ',OMP_GET_NUM_PROCS()
>>>> !$    write(*,'(a16,i2)') ' - Threads    : ',OMP_GET_MAX_THREADS()
>>>> 
>>>>       END
>>>> 
>>>> Which is compiled using ifort :
>>>> 
>>>> [charpy at master ~]$ ifort -openmp -o ooo main.f
>>>> ifort: Command line warning: openmp requires C style preprocessing; using fpp to preprocess
>>>> 
>>>> (master is the master node and main.f is the previous code). I can execute the code on a single node when I connect to it through ssh :
>>>> 
>>>> [charpy at master ~]$ ssh n8
>>>> [charpy at node8 ~]$ ./ooo
>>>>  OpenMP test
>>>>  - Processors :  2
>>>>  - Threads    :  2
>>>> 
>>>> The same output is given when launched from the master node. Now if I want to run it on more than one node I use the following command :
>>>> 
>>>> [charpy at master ~]$ qsub runit.sh
>>>> Your job 1484 ("TEST") has been submitted.
>>>> 
>>>> Where runit.sh is :
>>>> 
>>>> #$ -S /bin/sh
>>>> #$ -N TEST
>>>> #$ -o $HOME/output.txt
>>>> #$ -pe make 2
>>>> cd $HOME/ 
>>>> ./ooo
>>>> 
>>>> A fast execution of qstat gives :
>>>> 
>>>> [charpy at master ~]$ qstat -f
>>>> queuename                      qtype used/tot. load_avg arch          states
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at master.cluster
>>>>            BIP   0/2       0.02     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node1.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node2.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node3.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node4.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node5.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node6.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node7.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node8.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> 
>>>> ############################################################################
>>>>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
>>>> ############################################################################
>>>>    1484 0.00000 TEST        charpy       qw    05/27/2010 08:05:11     2
>>>> 
>>>> And a later one :
>>>> 
>>>> [charpy at master ~]$ qstat -f
>>>> queuename                      qtype used/tot. load_avg arch          states
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at master.cluster
>>>>            BIP   0/2       0.02     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node1.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node2.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node3.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node4.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node5.cluster
>>>>             BIP   1/2       0.00     lx24-amd64
>>>>    1484 0.55500 TEST       charpy       r     05/27/2010 08:07:10     1
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node6.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node7.cluster
>>>>             BIP   0/2       0.00     lx24-amd64
>>>> ----------------------------------------------------------------------------
>>>> 
>>>> all.q at node8.cluster
>>>>             BIP   1/2       0.00     lx24-amd64
>>>>    1484 0.55500 TEST       charpy       r     05/27/2010 08:07:10     1
>>>> 
>>>> Which suggests that the code is indeed running on two nodes. But if I watch the output file ?output.txt? it is :
>>>> 
>>>> OpenMP test
>>>>  - Processors :  2
>>>>  - Threads    :  2
>>>> 
>>>> Damn ! Only two processors ! I know I can change the number of threads through the OMP_NUM_THREADS variable but I can?t change the number of processors, even when asking for more threads.
>>>> 
>>>> I think the problem lies in the parallel environment. Indeed here the PE asked in my batch file is ?make? with two cores. In the Internet on many forums or wikis it is recommended to use the ?smv?, ?openmp? or ?mpi? environment, but when I ask for the possible PEs the answer is:
>>>> 
>>>> [charpy at master ~]$ qconf -spl
>>>> make
>>>> 
>>>> And this PE is configured like:
>>>> 
>>>> [charpy at master ~]$ qconf -sp make
>>>> pe_name           make
>>>> slots             999
>>>> user_lists        NONE
>>>> xuser_lists       NONE
>>>> start_proc_args   NONE
>>>> stop_proc_args    NONE
>>>> allocation_rule   $round_robin
>>>> control_slaves    TRUE
>>>> job_is_first_task FALSE
>>>> urgency_slots     min
>>>> 
>>>> So my questions are : is my problem really on the PE? If yes, how can I create an other PE that will suit OpenMP requirements (shared memory and multiprocessor)? Just through the qconf or is it other things (more complicated) to do?
>>>> 
>>>> I thank everyone who read this post to the end, and hope someone will be able to find the solution of my problem.
>>>> 
>>>> Charpy
>>>> 
>>>> ------------------------------------------------------
>>>> 
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258914
>>>> 
>>>> 
>>>> To unsubscribe from this discussion, e-mail: [
>>>> users-unsubscribe at gridengine.sunsource.net
>>>> ].
>>>> 
>>>>       
>>>> 
>> 
>> ------------------------------------------------------
>> 
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258938
>> 
>> 
>> To unsubscribe from this discussion, e-mail: [
>> users-unsubscribe at gridengine.sunsource.net
>> ].
>>   
>> 
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258959

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list