[GE users] SGE 6.2u2 qrsh timeout

pollinger harald.pollinger at sun.com
Wed Jun 10 15:56:50 BST 2009


On 06/10/09 11:51, adary wrote:
> We found the problem.
> 
> The timeout happened only when the sessions worked through WAN
> between our two sites. Our network was configured to close all
> sessions that were idle for 4 hours or more. The problem here is that
> using the builtin mechanism for qrsh doesn't keep the connection
> alive unlike ssh that we used in 6.1

It shouldn't be a problem to add such a keep alive message. I'll file a 
RFE for this.

Regards,
Harald

> I checked this with tcpdump, and the only communication between the
> terminal and the application is when there is actual activity sent to
> the terminal.
> 
> Problem with this is that in certain cases we have sessions that will
> appear idle for a long time while they are crunching numbers, and
> this can be longer than 4 hours.
> 
> The best solution would be for qrsh to keep the session alive.
> 
> -----Original Message----- From: adary [mailto:adary at marvell.com] 
> Sent: Wednesday, June 10, 2009 9:37 AM To:
> users at gridengine.sunsource.net Subject: RE: [GE users] SGE 6.2u2 qrsh
> timeout
> 
> Roughly, hours.
> 
> Users complained that they left their magma or primetime shells open,
> and when they came back in the morning the sessions were reset with
> the message I posted.
> 
> -----Original Message----- From: Harald.Pollinger at Sun.COM
> [mailto:Harald.Pollinger at Sun.COM] Sent: Tuesday, June 09, 2009 7:58
> PM To: users at gridengine.sunsource.net Subject: Re: [GE users] SGE
> 6.2u2 qrsh timeout
> 
> Internally there are several timers that should not time out as long
> as the connection is working properly. Do you know how long the
> connection was idle? Even a rough estimate could help - was it
> minutes, hours or days?
> 
> Regards, Harald
> 
> adary wrote:
>> It seems that there is a timeout in qrsh sessions that go through
>> the new built in mechanism in SGE 6.2u2.
>> 
>> We experienced unexpected interruptions of idle qrsh sessions with
>> the error message :
>> 
>> mantle[20]> error: commlib error: got select error (Connection
>> timed out)
>> 
>> This never happens in an active session that is using CPU.
>> 
>> is there a way to disable this timeout since I haven't found
>> anything relevant in the documentation.
>> 
>> Cheers,
>> 
>> Y.
>> 
>> ________________________________ Yuval Adar, Marvell Israel -
>> Senior UNIX System Administrator 6 Hamada Street Mordot HaCarmel
>> Industrial Park Yokneam, 20692, Israel Email:
>> adary at marvell.com<mailto:adary at marvell.com> Office:  +972.4.9091188
>> - OnNet: 704.1188 Fax:      +972.4.9091501 Mobile: +972.54.2493958 
>> Web site: http://www.marvell.com<http://www.marvell.com/>
>> 
>> This message may contain confidential, proprietary or legally
>> privileged information. The information is intended only for the
>> use of the individual or entity named above. If the reader of this
>> message is not the intended recipient, you are hereby notified that
>> any dissemination, distribution or copying of this communication is
>> strictly prohibited. If you have received this communication in
>> error, please notify us immediately by telephone or by e-mail and
>> delete the message from your computer. 
>> ________________________________
>> 
>> ------------------------------------------------------ 
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201288
>> 
>> 
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].


-- 
Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         Sun Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
http://www.sun.com/gridware
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201443

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list