[GE users] SGE 6.2u2 qrsh timeout

adary adary at marvell.com
Wed Jun 10 10:51:48 BST 2009

We found the problem.

The timeout happened only when the sessions worked through WAN between our two sites. Our network was configured to close all sessions that were idle for 4 hours or more. The problem here is that using the builtin mechanism for qrsh doesn't keep the connection alive unlike ssh that we used in 6.1

I checked this with tcpdump, and the only communication between the terminal and the application is when there is actual activity sent to the terminal.

Problem with this is that in certain cases we have sessions that will appear idle for a long time while they are crunching numbers, and this can be longer than 4 hours.

The best solution would be for qrsh to keep the session alive.

-----Original Message-----
From: adary [mailto:adary at marvell.com]
Sent: Wednesday, June 10, 2009 9:37 AM
To: users at gridengine.sunsource.net
Subject: RE: [GE users] SGE 6.2u2 qrsh timeout

Roughly, hours.

Users complained that they left their magma or primetime shells open, and when they came back in the morning the sessions were reset with the message I posted.

-----Original Message-----
From: Harald.Pollinger at Sun.COM [mailto:Harald.Pollinger at Sun.COM]
Sent: Tuesday, June 09, 2009 7:58 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] SGE 6.2u2 qrsh timeout

Internally there are several timers that should not time out as long as
the connection is working properly. Do you know how long the connection
was idle? Even a rough estimate could help - was it minutes, hours or days?


adary wrote:
> It seems that there is a timeout in qrsh sessions that go through the new built in mechanism in SGE 6.2u2.
> We experienced unexpected interruptions of idle qrsh sessions with the error message :
> mantle[20]> error: commlib error: got select error (Connection timed out)
> This never happens in an active session that is using CPU.
> is there a way to disable this timeout since I haven't found anything relevant in the documentation.
> Cheers,
> Y.
> ________________________________
> Yuval Adar, Marvell Israel - Senior UNIX System Administrator
> 6 Hamada Street
> Mordot HaCarmel Industrial Park
> Yokneam, 20692, Israel
> Email: adary at marvell.com<mailto:adary at marvell.com>
> Office:  +972.4.9091188 - OnNet: 704.1188
> Fax:      +972.4.9091501
> Mobile: +972.54.2493958
> Web site: http://www.marvell.com<http://www.marvell.com/>
> This message may contain confidential, proprietary or legally privileged information. The information is intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by telephone or by e-mail and delete the message from your computer.
> ________________________________
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201288
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         Sun Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list