[GE users] Java DRMAA Error : can't send response for this message id - protocol error ?

crei crei at sun.com
Tue Dec 15 09:07:52 GMT 2009


I think a

aimk -only-core commlib

compiles only the commlib. But since the lib is not dynamically linked you
would also have to compile the drmaa part.

aimk -only-core drmaa_all

Hope this helps,

Regards,

Christian


On 12/15/09 07:37, umanga wrote:
> Greetings ,
> 
> Is there any easy way to compile only this library ,without compiling 
> whole SGE?
> 
> Regards
> umanga
> crei wrote:
>> Yes,
>>
>> function cl_commlib_receive_message() in file cl_commlib.c:
>>
>> There are two places where CL_RETVAL_PROTOCOL_ERROR is returned. This generates
>> the error message you see: "can't send response for this message id
>> - protocol error".
>>
>> Problem is that the code "if ( response_mid > connection->last_send_message_id) "
>> does not handle a wrap-around of the message ids which happens at 65535. Then
>> the message id is set to 1 again.
>>
>> You can simply remove the code parts where CL_RETVAL_PROTOCOL_ERROR is returned
>> and it should work.
>>
>> I will work on a better solution for this ...
>>
>> Regards,
>>
>> Christian
>>
>>
>>
>>
>> On 12/08/09 11:05, umanga wrote:
>>   
>>> Greetings ,
>>>
>>> I took a peek at  'cl_commlib.c' and couldn't figure out what to look 
>>> into :)
>>> Any tip to get started ?
>>>
>>> regards
>>> umanga
>>>
>>> crei wrote:
>>>     
>>>> Hi,
>>>>
>>>> this looks indeed like some wrap-around problem in the commlib. The max.
>>>> message id is defined as 65535. I will try to reproduce this ...
>>>>
>>>> Regards,
>>>>
>>>> Christian
>>>>
>>>>
>>>>
>>>> templedf schrieb:
>>>>   
>>>>       
>>>>> I didn't think there was, but that exception sounds like there might 
>>>>> be.  If one of the developers doesn't chime in, I'll look into it myself.
>>>>>
>>>>> Daniel
>>>>>
>>>>> umanga wrote:
>>>>>   
>>>>>     
>>>>>         
>>>>>> Hi Daniel,
>>>>>>
>>>>>> Thanks for the reply.
>>>>>> Is there  limit of jobs that I can submit using a single Session? I am 
>>>>>> using the same session for entire execution of my application,which 
>>>>>> submit more than 30,000 jobs to the SGE.
>>>>>>
>>>>>>
>>>>>> regrads,
>>>>>>
>>>>>> templedf wrote:
>>>>>>     
>>>>>>       
>>>>>>           
>>>>>>> Hmmm...  Never seen that one before.  The message id is 65535, which is 
>>>>>>> max int, makes me a little suspicious.  I think you may have overflowed 
>>>>>>> the comm lib. :)  Reisi, care to take a peek?
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>> umanga wrote:
>>>>>>>   
>>>>>>>       
>>>>>>>         
>>>>>>>             
>>>>>>>> Greetings all,
>>>>>>>>
>>>>>>>> I am submitting  huge number of jobs using DRMAA.I am not using 
>>>>>>>> runBulkJobs() , just submitting one job at time using :
>>>>>>>>
>>>>>>>>     JobTemplate jt = sgeSession.createJobTemplate();
>>>>>>>>             jt.setArgs(job.getArgs());
>>>>>>>>             jt.setNativeSpecification(job.getNativeCommand());
>>>>>>>>             jt.setWorkingDirectory(job.getWorkDir());
>>>>>>>>             jt.setRemoteCommand(job.getWorkDir() + File.separator
>>>>>>>>                     + job.getRemoteCommand());
>>>>>>>>                        
>>>>>>>>             for (IJobHandler h : jobHandlers) {
>>>>>>>>                 h.beforeJobSubmit(job);
>>>>>>>>             }
>>>>>>>>            
>>>>>>>>             String sgeid = sgeSession.runJob(jt);
>>>>>>>>             sgeSession.deleteJobTemplate(jt);
>>>>>>>>
>>>>>>>>
>>>>>>>> I get the following error during the middle of my program execution 
>>>>>>>> (which takes about 2 days to finish one).
>>>>>>>>
>>>>>>>> Any tips?
>>>>>>>> Regards
>>>>>>>> umanga.
>>>>>>>>
>>>>>>>> Caused by: org.ggf.drmaa.DrmCommunicationException: failed receiving gdi 
>>>>>>>> request response for mid=65535 (can't send response for this message id 
>>>>>>>> - protocol error).
>>>>>>>>     at com.sun.grid.drmaa.SessionImpl.nativeRunJob(Native Method)
>>>>>>>>     at com.sun.grid.drmaa.SessionImpl.runJob(SessionImpl.java:349)
>>>>>>>>     at 
>>>>>>>> com.bigg.metagenome.grid.QueuedJobDispatcher.submitJob(QueuedJobDispatcher.java:60)
>>>>>>>>     ... 12 more
>>>>>>>>
>>>>>>>> ------------------------------------------------------
>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=231929
>>>>>>>>
>>>>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>>
>>>>>>>>     
>>>>>>>>         
>>>>>>>>           
>>>>>>>>               
>>>>>>> ------------------------------------------------------
>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=231934
>>>>>>>
>>>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>   
>>>>>>>       
>>>>>>>         
>>>>>>>             
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=231953
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>   
>>>>>     
>>>>>         
>>>>   
>>>>       
>>
>>   
> 

-- 
Sun Microsystems GmbH             Christian Reissmann
Dr.-Leo-Ritter-Str. 7             Software Engineer
D-93049 Regensburg                Phone: +49 (0)941 3075 112
Germany                           Fax:   +49 (0)941 3075 222
http://www.sun.de                 mailto: Christian.Reissmann at sun.com
                                   http://www.sun.com/gridengine
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=233451

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list