[GE users] error messages fro a parallel program
margaret_Doll at brown.edu
Thu Jul 22 13:48:42 BST 2010
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
A student sent me her error messages on a run. I assume this means that the portion of the program on compute-0-2 tried to send a message to the program on compute-0-15. However, the program on compute-0-15 was no longer running.
[0,1,0][btl_openib_component.c:1332:btl_openib_component_progress] from compute-0-15.local to: compute-0-2.local error polling HP CQ with status LOCAL QP OPERATION ERROR status number 2 for wr_id 1660537784 opcode 42
mpirun noticed that job rank 1 with PID 13338 on node compute-0-15.local exited on signal 15 (Terminated).
Is there a way to find out why the program on compute-0-15 ended before the program on compute-0-2 sent its message?
Sorry, I don't parallel program. Thanks for any help you can give me.
More information about the gridengine-users