[GE users] threaded jobs (no PE) and consumable memory

txema_heredia txema.heredia at upf.edu
Thu Jan 14 14:25:55 GMT 2010


Thanks for your answers

> Hi,
> 
> you might try to set an additional "-l h_stack=20M" (or whatever size 
> you need), as some threaded applications allocate the whole stack space 
> (which is in older versions of the SGE set equally to the h_vmem size - 
> in newer versions it is left as "unlimited", therefore avoiding this 
> problem).
> 
> Hope it helps,
> Sabine

I have tried it alone (-l h_stack=4G), and the program runs OK. The problem is that this attribute is not restrictive at all, so I have used -l h_stack=50M, but the job is still able to use 4G without receiving any termination signal (unlike if I use -l h_vmem=50M, which aborts the job when exceeded).

If combined -l h_stack and -l h_vmem, h_vmem takes preference and the job is killed as usual (the malloc error and segment violation thing).


> Hi,
> 
> Am 13.01.2010 um 18:31 schrieb txema_heredia:
> 
> > Hi all,
> >
> > I've found a problem in SGE 6.1u4 regarding threaded jobs (without  
> > using PE) when submitted to a host with an h_vmem request:
> >
> > I want to run "blastall -a 8"  in my cluster (the -a allows the  
> > process to use N threads to run its analysis, but it doesn't  
> > require a parallel environment, it uses libpthread.so.0).
> 
> a PE does not provide any parallelization functionality. It will just  
> tell SGE that this is a parallel job and make any preparation for the  
> parallel library you used for your job. If you just submit a serial  
> job and then use threads for the parallel tasks, SGE will overload a  
> node by putting too many jobs on it.
> 
> You will need a PE which is often called 'smp' and keep the default  
> settings when you define them; then attach it to a queue. This will  
> also multiply the resource request and it might be necessary to  
> submit them with a lower value as it's meant as a consumption per task.
> 
> --Reuti
> 

I created the smp PE as you said, and it works well by itself, but when combined with -l h_vmem, the problem remains as before, and the failed malloc is still there (but multiplied by the number given to -pe):


-pe smp 4
-l h_vmem=1G
4 threads

mmap(NULL, 4294971392, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40, -1, 0) = -1 ENOMEM (Cannot allocate memory)

4294971392 = 4 * 1G

---------------------------------------------------------------------------------------------

-pe smp 8
-l h_vmem=1G
8 threads

mmap(NULL, 8589938688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40, -1, 0) = -1 ENOMEM (Cannot allocate memory)

8589938688 = 8 * 1G

---------------------------------------------------------------------------------------------

-pe smp 8
-l h_vmem=5G
8 threads

mmap(NULL, 42949677056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40, -1, 0) = -1 ENOMEM (Cannot allocate memory)

42949677056 = 8 * 5G


so the problem is that it tries to allocate the maximum available memory before creating threads whenever a memory request (h_vmem, s_vmem) is given.

I have tried to replicate this behaviour using ulimit -v, but it worked correctly. What does the h_vmem really do to the job or even linux???

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=238770

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list