[GE users] pthread job killed by SGE

Reuti reuti at staff.uni-marburg.de
Mon Nov 26 12:57:16 GMT 2007


Hi,

Am 26.11.2007 um 13:09 schrieb Alois Dirnaichner:

> one of our scientists wrote a program with a single worker-pthread
> (started from the main thread) to do his calculations (he wants to  
> scale
> it for multiprocessors later).
> If he submits using qsub job.ini.sge, the binary is started but the
> memory for the worker-pthread cannot be allocated. (error 12)
> job.ini.sge:
>
> #$ -l h_vmem=1G
> #$ -l s_rt=6:0:0
> ./job.ini
>
> Our grid (sge 6.1) runs with two forced flags, s_rt and h_vmem, h_vmem
> is set on each exechost to RAM-500MB.
> SGE regards the job as completed, and no error is reported neither a
> limit seems to be exceeded.

this might be a race condition, whether the kernel sees the memory  
violation first, or SGE.

>
> If he uses qsub -l h_vmem=1G,s_rt=6:0:0 job.ini
> he gets
>
> Unable to run job: Script length does not match declared length.
> Exiting

Error 12 could mean "Cannot allocate memory". As h_vmem is a hard  
limit: does the job run without an enforced limit?

Before looking for the include file, I use this to get the error  
message:

#include <errno.h>
#include <stdio.h>
int main(void)
{
     errno = 12;
     perror("I face");
     return 0;
}

-- Reuti


> What is happening? And how can I enable jobs like this?
> Thanks in advance,
>
> Alois Dirnaichner
> Computing Maintenance Group, ASC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list