[GE users] Java DRMAA Error : can't send response for this message id - protocol error ?

umanga aumanga at biggjapan.com
Mon Dec 28 04:56:10 GMT 2009


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Greetings ,

I was wondering where this issue is fixed in SGE 6.5 ?

Best Regards,
umanga
crei wrote:

Yes,

function cl_commlib_receive_message() in file cl_commlib.c:

There are two places where CL_RETVAL_PROTOCOL_ERROR is returned. This generates
the error message you see: "can't send response for this message id
- protocol error".

Problem is that the code "if ( response_mid > connection->last_send_message_id) "
does not handle a wrap-around of the message ids which happens at 65535. Then
the message id is set to 1 again.

You can simply remove the code parts where CL_RETVAL_PROTOCOL_ERROR is returned
and it should work.

I will work on a better solution for this ...

Regards,

Christian




On 12/08/09 11:05, umanga wrote:


Greetings ,

I took a peek at  'cl_commlib.c' and couldn't figure out what to look
into :)
Any tip to get started ?

regards
umanga

crei wrote:


Hi,

this looks indeed like some wrap-around problem in the commlib. The max.
message id is defined as 65535. I will try to reproduce this ...

Regards,

Christian



templedf schrieb:



I didn't think there was, but that exception sounds like there might
be.  If one of the developers doesn't chime in, I'll look into it myself.

Daniel

umanga wrote:




Hi Daniel,

Thanks for the reply.
Is there  limit of jobs that I can submit using a single Session? I am
using the same session for entire execution of my application,which
submit more than 30,000 jobs to the SGE.


regrads,

templedf wrote:




Hmmm...  Never seen that one before.  The message id is 65535, which is
max int, makes me a little suspicious.  I think you may have overflowed
the comm lib. :)  Reisi, care to take a peek?

Daniel

umanga wrote:





Greetings all,

I am submitting  huge number of jobs using DRMAA.I am not using
runBulkJobs() , just submitting one job at time using :

    JobTemplate jt = sgeSession.createJobTemplate();
            jt.setArgs(job.getArgs());
            jt.setNativeSpecification(job.getNativeCommand());
            jt.setWorkingDirectory(job.getWorkDir());
            jt.setRemoteCommand(job.getWorkDir() + File.separator
                    + job.getRemoteCommand());

            for (IJobHandler h : jobHandlers) {
                h.beforeJobSubmit(job);
            }

            String sgeid = sgeSession.runJob(jt);
            sgeSession.deleteJobTemplate(jt);


I get the following error during the middle of my program execution
(which takes about 2 days to finish one).

Any tips?
Regards
umanga.

Caused by: org.ggf.drmaa.DrmCommunicationException: failed receiving gdi
request response for mid=65535 (can't send response for this message id
- protocol error).
    at com.sun.grid.drmaa.SessionImpl.nativeRunJob(Native Method)
    at com.sun.grid.drmaa.SessionImpl.runJob(SessionImpl.java:349)
    at
com.bigg.metagenome.grid.QueuedJobDispatcher.submitJob(QueuedJobDispatcher.java:60)
    ... 12 more

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=231929

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].






------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=231934

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].





------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=231953

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].













More information about the gridengine-users mailing list