[GE users] Jumbo Frame and Gridengine

hargitai joseph.hargitai at nyu.edu
Wed Jul 8 19:09:28 BST 2009


When you enable jumbo frames - in about a minute the node goes off SGE. qstat -f shows node au - and on the node itself retarting SGE client does not work. 


When you unset jumbo frame - node becomes available right away without communication. 

this is the message in sge messages on the node while on jumbo frame:

(No route to host)
07/08/2009 11:57:23|execd|compute-8-8|E|commlib error: got read error (closing "
cardiac.es.its.nyu.edu/qmaster/1")
07/08/2009 11:57:23|execd|compute-8-8|W|can't register at "qmaster": unable to c
ontact qmaster using port 536 on host "cardiac.es.its.nyu.edu"
07/08/2009 12:04:44|execd|compute-8-8|W|can't register at "qmaster": unable to s
end message to qmaster using port 536 on host "cardiac.es.its.nyu.edu": got mess
age ackno
07/08/2009 12:48:36|execd|compute-8-8|I|controlled shutdown 6.1u4
07/08/2009 12:54:08|execd|compute-8-8|I|starting up GE 6.1u4 (lx26-amd64)
07/08/2009 13:53:58|execd|compute-8-8|E|commlib error: got read error (closing "
cardiac.es.its.nyu.edu/qmaster/1")
07/08/2009 13:56:43|execd|compute-8-8|E|commlib error: endpoint is not unique er
ror (endpoint "cardiac.es.its.nyu.edu/qmaster/1" is already connected)
07/08/2009 13:57:43|execd|compute-8-8|E|acknowledge for unknown job 8188.1/maste
r

j

----- Original Message -----
From: rayson <rayrayson at gmail.com>
Date: Wednesday, July 8, 2009 1:58 pm
Subject: Re: [GE users] Jumbo Frame and Gridengine

> Did you get anything in the log files or "messages"??
> 
> As a test, can you enable jumbo frames and run some client commands,
> like qhost and qstat and see if you get any response from qmaster??
> 
> Looking at the commlib code, we have CL_DEFINE_DATA_BUFFER_SIZE
> defined to 1024 * 4. However, 4K is smaller than the size of a jumbo
> frame, which can be as big as 9KB. Note that 4K is used as the size of
> the read buffer and the write buffer (libs/comm/cl_communication.c).
> 
> My socket programming is a bit rusty, and I forgot how ethernet frames
> get assembled into TCP segments and presented to applications... I may
> need to do a bit of googling to see how it affects user-applications.
> 
> Rayson
> 
> 
> 
> 
> On 7/8/09, hargitai <joseph.hargitai at nyu.edu> wrote:
> > Hey all:
> >
> > We enabled jumbo frames on our cluster and SGE services stopped 
> communicating on eth0 - while ssh was/is working.
> >
> > Once jumbo frames were unset - SGE picked up and worked again.
> >
> > Is there a way to have SGE collaborate with jumbo frame settings?
> >
> > thanks,
> > joseph
> >
> > ------------------------------------------------------
> > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206185
> >
> > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> >
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206193
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206196

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list