[GE users] multiplied resource limits with pe

Gerd Marquardt marquardt at rrzn.uni-hannover.de
Fri Aug 25 10:55:47 BST 2006


Reuti wrote:
> Am 24.08.2006 um 17:15 schrieb Gerd Marquardt:
>
>> Hello,
>> we have a cluster with 4-way-nodes.
>> Running a simple Job which use a parallel environment and restrict 
>> some resource limits, the limits are multiplied.
>> For example the Job:
>> #$ -pe mpi 4
>> #$ -l h_vmem=128M
>> #$ -l h_stack=32M
>> echo ulimit -d
>> ulimit -d
>> echo ulimit -s
>> ulimit -s
>>
>> Produce this output:
>> ulimit -d
>> 524288
>> ulimit -s
>> 131072
>>
>> The limits are multiplied by 4.
>> How can I avoid this behavior?
>>
>> We have Red Hat EL4, we have defined a pe for our shared memory 
>> programs ( OpenMP). Here also a defined time limit (h_cpu) is 
>> multiplied. The limit have no affect on the process but on the 
>> threads. So mostly the process run 4 times too long.
>
> AFAIK:
>
> Using Forks:
>
> You are right, that each fork will get the multiplied limit, but SGE 
> will take care of the accumulated consumption and kill the job.
>
> Using Threads:
>
> All CPU time is accumulated by the main thread, so the SGE and kernel 
> limits should be the same, and the job will also be killed. This you 
> could test by running the program interactively and watch in top the 
> used up CPU time, which should run faster than your watch.
>
On our nodes we run Red Hat EL3 with kernel 2.4.21, the defined time 
limit has only effect on the running threads. So we must divide the 
limit to the origin in our system profile.

-- 
 Gerd Marquardt
 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list