[GE users] OpenMP issue with SGE5.3

dagru d.gruber at sun.com
Thu May 27 10:43:36 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Charpy,

yes, MPI would do it. The difference when using -pe make 1 instead of 2 is
that you would allow oversubscription of an host because you are actually using
2 processors instead of the 1 requested. If an overall slot limit of 2 per host is configured
on your system, two of your programs could be scheduled by SGE to the same
host, which is not really what you want.

Daniel


On 05/27/10 10:58, charpy wrote:

Hello Daniel, and thank you for your answer,

If I undersand well my program won't be able to run over different nodes using only OpenMP. Well what a shame it's not so easy ^^
I guess to do so I need MPI don't I? Concerning the PE configuration, isn'it possible just to write "-pe make 1" instead of "-pe make 2" in my bash file, which is equivalent to configure my PE for runnig my program on a single node?

Charpy



Hi Charpy,

OpenMP makes your program multithreaded, which means it can only
exploit multi-core, multi-processor nodes but can not be distributed over
different nodes. Hence using SGE PEs also does not transform your program
somehow. What you need to do is that you modify your PE in order
to allocate only slots on one single host which your program can use.
In order to do so you must adapt your "allocation_rule" to $pe_slots.

Daniel

On 05/27/10 09:04, charpy wrote:


Hello everyone,

I am a PhD student and currently working on a very time and resource consuming program. Because of these issues I parallelised the code using OpenMP procedures, and tried to run it in one of the lab?s cluster, but it?s not working very well? The cluster is running under SGE 5.3, has 8 nodes plus the master one, each node having 2 processors. My problem is that I can only use the processors of one node at a time, even when using qsub.

Here is an example, I wrote the simple following fortran code :

      PROGRAM main

!$    use omp_lib
      IMPLICIT none

!$    CALL OMP_SET_DYNAMIC(.true.)
!$    CALL OMP_SET_NESTED(.true.)

      write(*,*) 'OpenMP test'
!$    write(*,'(a16,i2)') ' - Processors : ',OMP_GET_NUM_PROCS()
!$    write(*,'(a16,i2)') ' - Threads    : ',OMP_GET_MAX_THREADS()

      END

Which is compiled using ifort :

[charpy at master ~]$ ifort -openmp -o ooo main.f
ifort: Command line warning: openmp requires C style preprocessing; using fpp to preprocess

(master is the master node and main.f is the previous code). I can execute the code on a single node when I connect to it through ssh :

[charpy at master ~]$ ssh n8
[charpy at node8 ~]$ ./ooo
 OpenMP test
 - Processors :  2
 - Threads    :  2

The same output is given when launched from the master node. Now if I want to run it on more than one node I use the following command :

[charpy at master ~]$ qsub runit.sh
Your job 1484 ("TEST") has been submitted.

Where runit.sh is :

#$ -S /bin/sh
#$ -N TEST
#$ -o $HOME/output.txt
#$ -pe make 2
cd $HOME/
./ooo

A fast execution of qstat gives :

[charpy at master ~]$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q at master.cluster<mailto:all.q at master.cluster>           BIP   0/2       0.02     lx24-amd64
----------------------------------------------------------------------------
all.q at node1.cluster<mailto:all.q at node1.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node2.cluster<mailto:all.q at node2.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node3.cluster<mailto:all.q at node3.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node4.cluster<mailto:all.q at node4.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node5.cluster<mailto:all.q at node5.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node6.cluster<mailto:all.q at node6.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node7.cluster<mailto:all.q at node7.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node8.cluster<mailto:all.q at node8.cluster>            BIP   0/2       0.00     lx24-amd64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
   1484 0.00000 TEST        charpy       qw    05/27/2010 08:05:11     2

And a later one :

[charpy at master ~]$ qstat -f
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q at master.cluster<mailto:all.q at master.cluster>           BIP   0/2       0.02     lx24-amd64
----------------------------------------------------------------------------
all.q at node1.cluster<mailto:all.q at node1.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node2.cluster<mailto:all.q at node2.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node3.cluster<mailto:all.q at node3.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node4.cluster<mailto:all.q at node4.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node5.cluster<mailto:all.q at node5.cluster>            BIP   1/2       0.00     lx24-amd64
   1484 0.55500 TEST       charpy       r     05/27/2010 08:07:10     1
----------------------------------------------------------------------------
all.q at node6.cluster<mailto:all.q at node6.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node7.cluster<mailto:all.q at node7.cluster>            BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
all.q at node8.cluster<mailto:all.q at node8.cluster>            BIP   1/2       0.00     lx24-amd64
   1484 0.55500 TEST       charpy       r     05/27/2010 08:07:10     1

Which suggests that the code is indeed running on two nodes. But if I watch the output file ?output.txt? it is :

OpenMP test
 - Processors :  2
 - Threads    :  2

Damn ! Only two processors ! I know I can change the number of threads through the OMP_NUM_THREADS variable but I can?t change the number of processors, even when asking for more threads.

I think the problem lies in the parallel environment. Indeed here the PE asked in my batch file is ?make? with two cores. In the Internet on many forums or wikis it is recommended to use the ?smv?, ?openmp? or ?mpi? environment, but when I ask for the possible PEs the answer is:

[charpy at master ~]$ qconf -spl
make

And this PE is configured like:

[charpy at master ~]$ qconf -sp make
pe_name           make
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   NONE
stop_proc_args    NONE
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

So my questions are : is my problem really on the PE? If yes, how can I create an other PE that will suit OpenMP requirements (shared memory and multiprocessor)? Just through the qconf or is it other things (more complicated) to do?

I thank everyone who read this post to the end, and hope someone will be able to find the solution of my problem.

Charpy

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258914

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258938

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].





More information about the gridengine-users mailing list