[GE users] Java DRMAA Error : can't send response for this message id - protocol error ?

crei crei at sun.com
Tue Dec 8 10:17:11 GMT 2009


Yes,

function cl_commlib_receive_message() in file cl_commlib.c:

There are two places where CL_RETVAL_PROTOCOL_ERROR is returned. This generates
the error message you see: "can't send response for this message id
- protocol error".

Problem is that the code "if ( response_mid > connection->last_send_message_id) "
does not handle a wrap-around of the message ids which happens at 65535. Then
the message id is set to 1 again.

You can simply remove the code parts where CL_RETVAL_PROTOCOL_ERROR is returned
and it should work.

I will work on a better solution for this ...

Regards,

Christian




On 12/08/09 11:05, umanga wrote:
> Greetings ,
> 
> I took a peek at  'cl_commlib.c' and couldn't figure out what to look 
> into :)
> Any tip to get started ?
> 
> regards
> umanga
> 
> crei wrote:
>> Hi,
>>
>> this looks indeed like some wrap-around problem in the commlib. The max.
>> message id is defined as 65535. I will try to reproduce this ...
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>> templedf schrieb:
>>   
>>> I didn't think there was, but that exception sounds like there might 
>>> be.  If one of the developers doesn't chime in, I'll look into it myself.
>>>
>>> Daniel
>>>
>>> umanga wrote:
>>>   
>>>     
>>>> Hi Daniel,
>>>>
>>>> Thanks for the reply.
>>>> Is there  limit of jobs that I can submit using a single Session? I am 
>>>> using the same session for entire execution of my application,which 
>>>> submit more than 30,000 jobs to the SGE.
>>>>
>>>>
>>>> regrads,
>>>>
>>>> templedf wrote:
>>>>     
>>>>       
>>>>> Hmmm...  Never seen that one before.  The message id is 65535, which is 
>>>>> max int, makes me a little suspicious.  I think you may have overflowed 
>>>>> the comm lib. :)  Reisi, care to take a peek?
>>>>>
>>>>> Daniel
>>>>>
>>>>> umanga wrote:
>>>>>   
>>>>>       
>>>>>         
>>>>>> Greetings all,
>>>>>>
>>>>>> I am submitting  huge number of jobs using DRMAA.I am not using 
>>>>>> runBulkJobs() , just submitting one job at time using :
>>>>>>
>>>>>>     JobTemplate jt = sgeSession.createJobTemplate();
>>>>>>             jt.setArgs(job.getArgs());
>>>>>>             jt.setNativeSpecification(job.getNativeCommand());
>>>>>>             jt.setWorkingDirectory(job.getWorkDir());
>>>>>>             jt.setRemoteCommand(job.getWorkDir() + File.separator
>>>>>>                     + job.getRemoteCommand());
>>>>>>                        
>>>>>>             for (IJobHandler h : jobHandlers) {
>>>>>>                 h.beforeJobSubmit(job);
>>>>>>             }
>>>>>>            
>>>>>>             String sgeid = sgeSession.runJob(jt);
>>>>>>             sgeSession.deleteJobTemplate(jt);
>>>>>>
>>>>>>
>>>>>> I get the following error during the middle of my program execution 
>>>>>> (which takes about 2 days to finish one).
>>>>>>
>>>>>> Any tips?
>>>>>> Regards
>>>>>> umanga.
>>>>>>
>>>>>> Caused by: org.ggf.drmaa.DrmCommunicationException: failed receiving gdi 
>>>>>> request response for mid=65535 (can't send response for this message id 
>>>>>> - protocol error).
>>>>>>     at com.sun.grid.drmaa.SessionImpl.nativeRunJob(Native Method)
>>>>>>     at com.sun.grid.drmaa.SessionImpl.runJob(SessionImpl.java:349)
>>>>>>     at 
>>>>>> com.bigg.metagenome.grid.QueuedJobDispatcher.submitJob(QueuedJobDispatcher.java:60)
>>>>>>     ... 12 more
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=231929
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>           
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=231934
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>   
>>>>>       
>>>>>         
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=231953
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>   
>>>     
>>
>>   
> 

-- 
Sun Microsystems GmbH             Christian Reissmann
Dr.-Leo-Ritter-Str. 7             Software Engineer
D-93049 Regensburg                Phone: +49 (0)941 3075 112
Germany                           Fax:   +49 (0)941 3075 222
http://www.sun.de                 mailto: Christian.Reissmann at sun.com
                                   http://www.sun.com/gridengine
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=232193

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list