[GE dev] DRMAA crash

ppetosy ppetrosy at loni.ucla.edu
Tue Jul 7 19:30:43 BST 2009


We are using DRMAA for submitting and checking the status of jobs. We use drmaa.jar in our application and everything works fine. But sometimes for undetermined reason DRMAA crashes taking with it our application. When I print the stack of the core file I get following 

#0  0x0000002ba3d4fd80 in cl_message_list_get_first_elem () from /ifshome/sge6.2u1/lib/lx24-amd64/libdrmaa.so.1.0
#1  0x0000002ba3d60120 in cl_commlib_app_message_queue_cleanup () from /ifshome/sge6.2u1/lib/lx24-amd64/libdrmaa.so.1.0
#2  0x0000002ba3d5e724 in cl_com_handle_service_thread () from /ifshome/sge6.2u1/lib/lx24-amd64/libdrmaa.so.1.0
#3  0x0000003321a06137 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000033211c9883 in clone () from /lib64/tls/libc.so.6

Can you guys please investigate and find out the reason why it happens. This problem can be a reason of missing some needed arguments or data which our application misses. But we are not able to detect the reason.

Also I noticed that we are getting a lot of "GDI mismatch" and "failed receiving gdi request response for mid=65535 (can't send response for this message id - protocol error)." exceptions from DRMAA. I don't know if these are related to the crash or not.

I uploaded core file of the crash here

Please have a look and let me know. 

Thank you very much !



To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list