[GE users] error messages fro a parallel program

mad margaret_Doll at brown.edu
Thu Jul 22 13:48:42 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

A student sent me her error messages on a run.   I assume this means that the portion of the program on compute-0-2 tried to send a message to the program on compute-0-15.  However, the program on compute-0-15 was no longer running.

[0,1,0][btl_openib_component.c:1332:btl_openib_component_progress] from compute-0-15.local to: compute-0-2.local error polling HP CQ with status LOCAL QP OPERATION ERROR status number 2 for wr_id 1660537784 opcode 42
mpirun noticed that job rank 1 with PID 13338 on node compute-0-15.local exited on signal 15 (Terminated).

Is there a way to find out why the program on compute-0-15 ended before the program on compute-0-2 sent its message?

Sorry, I don't parallel program.  Thanks for any help you can give me.






More information about the gridengine-users mailing list