[GE users] weird qlogin problem on cluster

Yuan Wan ywan at ed.ac.uk
Thu Nov 8 12:00:47 GMT 2007


Hi all,

I hope someone can give some hint on my qlogin problem.

Our cluster has two frontend nodes which are allowed to qlogin to 
interactive work nodes. The qlogin was totally fine.

But I found yesterday that one of the login node (frontend02) has problem 
doing qlogin: The qlogin wrapper not be called after scheduler allocating 
slot on work node. So the qlogin procedure just halt there with message
'timeout (3 s) expired while waiting on socket fd 4'. But another frontend 
node works exactly fine with qlogin.

-----------------------------------------------------------------------
[ywan at frontend01 ~]$ qlogin
local configuration frontend01.ecdf.ed.ac.uk not defined - using global 
configuration
Your job 4180551 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 4180551 has been successfully scheduled.
Establishing /usr/local/Cluster-Apps/sge/cvos/qlogin_wrapper session to 
host node001.beowulf.cluster ...
Last login: Thu Nov  8 11:01:03 2007 from frontend01.ecdf.ed.ac.uk
[ywan at node001 ~]$

-----------------------------------------------------------------------
[ywan at frontend02 ~]$ qlogin
local configuration frontend02.ecdf.ed.ac.uk not defined - using global 
configuration
Your job 4180550 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...timeout (3 s) expired while 
waiting on socket fd 4

Your interactive job 4180550 has been successfully scheduled.
timeout (5 s) expired while waiting on socket fd 4

Your interactive job 4180550 has been successfully scheduled.
timeout (3 s) expired while waiting on socket fd 4

Your interactive job 4180550 has been successfully scheduled.
timeout (3 s) expired while waiting on socket fd 4

Your interactive job 4180550 has been successfully scheduled.
timeout (3 s) expired while waiting on socket fd 4

Your interactive job 4180550 has been successfully scheduled.
timeout (4 s) expired while waiting on socket fd 4
...
-----------------------------------------------------------------------

I did this test without firewall, and I also rebooted both of them with 
same image. I even builded a new queue for testing. All these test gave 
the same results as above.


Anyone could know the possible reason?


--Yuan


Yuan Wan
-- 
Unix Section
Information Services Infrastructure Division
University of Edinburgh

tel: 0131 650 4985
email: ywan at ed.ac.uk

2032 Computing Services, JCMB
The King's Buildings,
Edinburgh, EH9 3JZ

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list