[GE users] Jumbo Frame and Gridengine

rayson rayrayson at gmail.com
Wed Jul 8 19:13:11 BST 2009


Anything from the qmaster's side??

Rayson



On 7/8/09, hargitai <joseph.hargitai at nyu.edu> wrote:
> When you enable jumbo frames - in about a minute the node goes off SGE. qstat -f shows node au - and on the node itself retarting SGE client does not work.
>
>
> When you unset jumbo frame - node becomes available right away without communication.
>
> this is the message in sge messages on the node while on jumbo frame:
>
> (No route to host)
> 07/08/2009 11:57:23|execd|compute-8-8|E|commlib error: got read error (closing "
> cardiac.es.its.nyu.edu/qmaster/1")
> 07/08/2009 11:57:23|execd|compute-8-8|W|can't register at "qmaster": unable to c
> ontact qmaster using port 536 on host "cardiac.es.its.nyu.edu"
> 07/08/2009 12:04:44|execd|compute-8-8|W|can't register at "qmaster": unable to s
> end message to qmaster using port 536 on host "cardiac.es.its.nyu.edu": got mess
> age ackno
> 07/08/2009 12:48:36|execd|compute-8-8|I|controlled shutdown 6.1u4
> 07/08/2009 12:54:08|execd|compute-8-8|I|starting up GE 6.1u4 (lx26-amd64)
> 07/08/2009 13:53:58|execd|compute-8-8|E|commlib error: got read error (closing "
> cardiac.es.its.nyu.edu/qmaster/1")
> 07/08/2009 13:56:43|execd|compute-8-8|E|commlib error: endpoint is not unique er
> ror (endpoint "cardiac.es.its.nyu.edu/qmaster/1" is already connected)
> 07/08/2009 13:57:43|execd|compute-8-8|E|acknowledge for unknown job 8188.1/maste
> r
>
> j
>
> ----- Original Message -----
> From: rayson <rayrayson at gmail.com>
> Date: Wednesday, July 8, 2009 1:58 pm
> Subject: Re: [GE users] Jumbo Frame and Gridengine
>
> > Did you get anything in the log files or "messages"??
> >
> > As a test, can you enable jumbo frames and run some client commands,
> > like qhost and qstat and see if you get any response from qmaster??
> >
> > Looking at the commlib code, we have CL_DEFINE_DATA_BUFFER_SIZE
> > defined to 1024 * 4. However, 4K is smaller than the size of a jumbo
> > frame, which can be as big as 9KB. Note that 4K is used as the size of
> > the read buffer and the write buffer (libs/comm/cl_communication.c).
> >
> > My socket programming is a bit rusty, and I forgot how ethernet frames
> > get assembled into TCP segments and presented to applications... I may
> > need to do a bit of googling to see how it affects user-applications.
> >
> > Rayson
> >
> >
> >
> >
> > On 7/8/09, hargitai <joseph.hargitai at nyu.edu> wrote:
> > > Hey all:
> > >
> > > We enabled jumbo frames on our cluster and SGE services stopped
> > communicating on eth0 - while ssh was/is working.
> > >
> > > Once jumbo frames were unset - SGE picked up and worked again.
> > >
> > > Is there a way to have SGE collaborate with jumbo frame settings?
> > >
> > > thanks,
> > > joseph
> > >
> > > ------------------------------------------------------
> > > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206185
> > >
> > > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> > >
> >
> > ------------------------------------------------------
> > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206193
> >
> > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206196
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206200

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list