[GE users] Problem with FineTurbo Paraallel Job running

Reuti reuti at staff.uni-marburg.de
Wed May 14 08:42:12 BST 2008


Hi,

Am 14.05.2008 um 07:35 schrieb Thamizh:

> We tried to run FineTurbo_74_2 jobs in grid machine(RHEL-4, N1GE6  
> installed with 8 processor per node and 32GB memory). It throws  
> p4_error: semget failed for setnum: error. We checked with Google  
> and cleaned the memory left by previous fine turbo jobs by using  
> these below scripts. Still we are facing the same problem
>
> /opt/apps/numeca/numeca8/fine82_1/_mpi/util/cleanipcs
> /opt/apps/numeca/fine83_1/_mpi/util/cleanipcs
> /opt/apps/numeca/fine74_2/_mpi/util/cleanipcs

the question is also: are you using its features? I mean, MPICH(1)  
will create these ipcs when it was configured with --with-comm=shared  
for compilation, but it will also allocate the shared mermory  
segements in double, if the hostlist is of the wrong format:

node01
node01
node02
node02

will create the shared memory segments for each process, but not use  
them at all! To support this; the PE start_proc_args procedure must  
provide a machinefile of the form:

node01:2
node02:2

Easiest would be to recompile MPICH(1) and your application to avoid  
this ipcs stuff at all (or move tom Open MPI). It's not easy to track  
which shared memory segment belongs to which job. If you have two  
different jobs from one user on a node, the supplied cleanips will  
remove just all of them, and the second job would be in serious trouble.

...continued in 2nd email...

-- Reuti

>
> Can you please let us know If there any solution to fix this problem?
>
> Regards,
> Thamizhannal P
> Best Jokes, Best Friends, Best Food. Get all this and more on Best  
> of Yahoo! Groups.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list