[GE users] SGE 6.2u2 qrsh timeout
harald.pollinger at sun.com
Wed Jun 10 15:56:50 BST 2009
On 06/10/09 11:51, adary wrote:
> We found the problem.
> The timeout happened only when the sessions worked through WAN
> between our two sites. Our network was configured to close all
> sessions that were idle for 4 hours or more. The problem here is that
> using the builtin mechanism for qrsh doesn't keep the connection
> alive unlike ssh that we used in 6.1
It shouldn't be a problem to add such a keep alive message. I'll file a
RFE for this.
> I checked this with tcpdump, and the only communication between the
> terminal and the application is when there is actual activity sent to
> the terminal.
> Problem with this is that in certain cases we have sessions that will
> appear idle for a long time while they are crunching numbers, and
> this can be longer than 4 hours.
> The best solution would be for qrsh to keep the session alive.
> -----Original Message----- From: adary [mailto:adary at marvell.com]
> Sent: Wednesday, June 10, 2009 9:37 AM To:
> users at gridengine.sunsource.net Subject: RE: [GE users] SGE 6.2u2 qrsh
> Roughly, hours.
> Users complained that they left their magma or primetime shells open,
> and when they came back in the morning the sessions were reset with
> the message I posted.
> -----Original Message----- From: Harald.Pollinger at Sun.COM
> [mailto:Harald.Pollinger at Sun.COM] Sent: Tuesday, June 09, 2009 7:58
> PM To: users at gridengine.sunsource.net Subject: Re: [GE users] SGE
> 6.2u2 qrsh timeout
> Internally there are several timers that should not time out as long
> as the connection is working properly. Do you know how long the
> connection was idle? Even a rough estimate could help - was it
> minutes, hours or days?
> Regards, Harald
> adary wrote:
>> It seems that there is a timeout in qrsh sessions that go through
>> the new built in mechanism in SGE 6.2u2.
>> We experienced unexpected interruptions of idle qrsh sessions with
>> the error message :
>> mantle> error: commlib error: got select error (Connection
>> timed out)
>> This never happens in an active session that is using CPU.
>> is there a way to disable this timeout since I haven't found
>> anything relevant in the documentation.
>> ________________________________ Yuval Adar, Marvell Israel -
>> Senior UNIX System Administrator 6 Hamada Street Mordot HaCarmel
>> Industrial Park Yokneam, 20692, Israel Email:
>> adary at marvell.com<mailto:adary at marvell.com> Office: +972.4.9091188
>> - OnNet: 704.1188 Fax: +972.4.9091501 Mobile: +972.54.2493958
>> Web site: http://www.marvell.com<http://www.marvell.com/>
>> This message may contain confidential, proprietary or legally
>> privileged information. The information is intended only for the
>> use of the individual or entity named above. If the reader of this
>> message is not the intended recipient, you are hereby notified that
>> any dissemination, distribution or copying of this communication is
>> strictly prohibited. If you have received this communication in
>> error, please notify us immediately by telephone or by e-mail and
>> delete the message from your computer.
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
Sun Microsystems GmbH Harald Pollinger
Dr.-Leo-Ritter-Str. 7 Sun Grid Engine Engineering
D-93049 Regensburg Phone: +49 (0)941 3075-209 (x60209)
Germany Fax: +49 (0)941 3075-222 (x60222)
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users