[GE users] Jumbo Frame and Gridengine

hargitai joseph.hargitai at nyu.edu
Wed Jul 8 19:18:08 BST 2009


from master node qmaster:

07/08/2009 13:56:44|qmaster|cardiac|E|commlib error: endpoint is not unique error (endpoint "cardiac.es.its.nyu.edu/qmaster/1" is already connected)
07/08/2009 13:56:46|qmaster|cardiac|E|commlib error: got read error (closing "cardiac.es.its.nyu.edu/qstat/8395")


Is the new SGE version released? 

best,

j

----- Original Message -----
From: rayson <rayrayson at gmail.com>
Date: Wednesday, July 8, 2009 2:13 pm
Subject: Re: [GE users] Jumbo Frame and Gridengine

> Anything from the qmaster's side??
> 
> Rayson
> 
> 
> 
> On 7/8/09, hargitai <joseph.hargitai at nyu.edu> wrote:
> > When you enable jumbo frames - in about a minute the node goes off 
> SGE. qstat -f shows node au - and on the node itself retarting SGE 
> client does not work.
> >
> >
> > When you unset jumbo frame - node becomes available right away 
> without communication.
> >
> > this is the message in sge messages on the node while on jumbo frame:
> >
> > (No route to host)
> > 07/08/2009 11:57:23|execd|compute-8-8|E|commlib error: got read 
> error (closing "
> > cardiac.es.its.nyu.edu/qmaster/1")
> > 07/08/2009 11:57:23|execd|compute-8-8|W|can't register at "qmaster": 
> unable to c
> > ontact qmaster using port 536 on host "cardiac.es.its.nyu.edu"
> > 07/08/2009 12:04:44|execd|compute-8-8|W|can't register at "qmaster": 
> unable to s
> > end message to qmaster using port 536 on host 
> "cardiac.es.its.nyu.edu": got mess
> > age ackno
> > 07/08/2009 12:48:36|execd|compute-8-8|I|controlled shutdown 6.1u4
> > 07/08/2009 12:54:08|execd|compute-8-8|I|starting up GE 6.1u4 (lx26-amd64)
> > 07/08/2009 13:53:58|execd|compute-8-8|E|commlib error: got read 
> error (closing "
> > cardiac.es.its.nyu.edu/qmaster/1")
> > 07/08/2009 13:56:43|execd|compute-8-8|E|commlib error: endpoint is 
> not unique er
> > ror (endpoint "cardiac.es.its.nyu.edu/qmaster/1" is already connected)
> > 07/08/2009 13:57:43|execd|compute-8-8|E|acknowledge for unknown job 
> 8188.1/maste
> > r
> >
> > j
> >
> > ----- Original Message -----
> > From: rayson <rayrayson at gmail.com>
> > Date: Wednesday, July 8, 2009 1:58 pm
> > Subject: Re: [GE users] Jumbo Frame and Gridengine
> >
> > > Did you get anything in the log files or "messages"??
> > >
> > > As a test, can you enable jumbo frames and run some client commands,
> > > like qhost and qstat and see if you get any response from qmaster??
> > >
> > > Looking at the commlib code, we have CL_DEFINE_DATA_BUFFER_SIZE
> > > defined to 1024 * 4. However, 4K is smaller than the size of a jumbo
> > > frame, which can be as big as 9KB. Note that 4K is used as the 
> size of
> > > the read buffer and the write buffer (libs/comm/cl_communication.c).
> > >
> > > My socket programming is a bit rusty, and I forgot how ethernet frames
> > > get assembled into TCP segments and presented to applications... I 
> may
> > > need to do a bit of googling to see how it affects user-applications.
> > >
> > > Rayson
> > >
> > >
> > >
> > >
> > > On 7/8/09, hargitai <joseph.hargitai at nyu.edu> wrote:
> > > > Hey all:
> > > >
> > > > We enabled jumbo frames on our cluster and SGE services stopped
> > > communicating on eth0 - while ssh was/is working.
> > > >
> > > > Once jumbo frames were unset - SGE picked up and worked again.
> > > >
> > > > Is there a way to have SGE collaborate with jumbo frame settings?
> > > >
> > > > thanks,
> > > > joseph
> > > >
> > > > ------------------------------------------------------
> > > > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206185
> > > >
> > > > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> > > >
> > >
> > > ------------------------------------------------------
> > > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206193
> > >
> > > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> >
> > ------------------------------------------------------
> > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206196
> >
> > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> >
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206200
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206201

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list