[GE users] error messages fro a parallel program

ppk ppk at ats.ucla.edu
Thu Jul 22 18:03:18 BST 2010


I have seen similar errors (not this one though) from IB 
devices in the past.  The error you are getting is similar 
to the one here.

http://www.open-mpi.org/community/lists/users/2009/03/8310.php

Prakashan


mad wrote:
> A student sent me her error messages on a run.   I assume this means that the portion of the program on compute-0-2 tried to send a message to the program on compute-0-15.  However, the program on compute-0-15 was no longer running.
> 
> [0,1,0][btl_openib_component.c:1332:btl_openib_component_progress] from compute-0-15.local to: compute-0-2.local error polling HP CQ with status LOCAL QP OPERATION ERROR status number 2 for wr_id 1660537784 opcode 42
> mpirun noticed that job rank 1 with PID 13338 on node compute-0-15.local exited on signal 15 (Terminated).
> 
> Is there a way to find out why the program on compute-0-15 ended before the program on compute-0-2 sent its message?
> 
> Sorry, I don't parallel program.  Thanks for any help you can give me.
> 
> 
> 
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=269754

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list