[GE users] weird qlogin problem on cluster

Yuan Wan ywan at ed.ac.uk
Thu Nov 8 13:54:20 GMT 2007


>
> Am 08.11.2007 um 13:00 schrieb Yuan Wan:
>
>> I hope someone can give some hint on my qlogin problem.
>> 
>> Our cluster has two frontend nodes which are allowed to qlogin to 
>> interactive work nodes. The qlogin was totally fine.
>> 
>> But I found yesterday that one of the login node (frontend02) has problem 
>> doing qlogin: The qlogin wrapper not be called after scheduler allocating 
>> slot on work node. So the qlogin procedure just halt there with message
>> 'timeout (3 s) expired while waiting on socket fd 4'. But another frontend 
>> node works exactly fine with qlogin.
>
> so it was working before also with this node?
>
> It could be, that the local file system (i.e. /var or /tmp) is full; or the 
> node lost the NFS mount for any reason.
>
> -- Reuti

the local file system is quite empty, and shared file system is fine.

--Yuan


>
>> -----------------------------------------------------------------------
>> [ywan at frontend01 ~]$ qlogin
>> local configuration frontend01.ecdf.ed.ac.uk not defined - using global 
>> configuration
>> Your job 4180551 ("QLOGIN") has been submitted
>> waiting for interactive job to be scheduled ...
>> Your interactive job 4180551 has been successfully scheduled.
>> Establishing /usr/local/Cluster-Apps/sge/cvos/qlogin_wrapper session to 
>> host node001.beowulf.cluster ...
>> Last login: Thu Nov  8 11:01:03 2007 from frontend01.ecdf.ed.ac.uk
>> [ywan at node001 ~]$
>> 
>> -----------------------------------------------------------------------
>> [ywan at frontend02 ~]$ qlogin
>> local configuration frontend02.ecdf.ed.ac.uk not defined - using global 
>> configuration
>> Your job 4180550 ("QLOGIN") has been submitted
>> waiting for interactive job to be scheduled ...timeout (3 s) expired while 
>> waiting on socket fd 4
>> 
>> Your interactive job 4180550 has been successfully scheduled.
>> timeout (5 s) expired while waiting on socket fd 4
>> 
>> Your interactive job 4180550 has been successfully scheduled.
>> timeout (3 s) expired while waiting on socket fd 4
>> 
>> Your interactive job 4180550 has been successfully scheduled.
>> timeout (3 s) expired while waiting on socket fd 4
>> 
>> Your interactive job 4180550 has been successfully scheduled.
>> timeout (3 s) expired while waiting on socket fd 4
>> 
>> Your interactive job 4180550 has been successfully scheduled.
>> timeout (4 s) expired while waiting on socket fd 4
>> ...
>> -----------------------------------------------------------------------
>> 
>> I did this test without firewall, and I also rebooted both of them with 
>> same image. I even builded a new queue for testing. All these test gave the 
>> same results as above.
>> 
>> 
>> Anyone could know the possible reason?
>> 
>> 
>> --Yuan
>> 
>> 
>> Yuan Wan
>> -- 
>> Unix Section
>> Information Services Infrastructure Division
>> University of Edinburgh
>> 
>> tel: 0131 650 4985
>> email: ywan at ed.ac.uk
>> 
>> 2032 Computing Services, JCMB
>> The King's Buildings,
>> Edinburgh, EH9 3JZ
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

-- 
Unix Section
Information Services Infrastructure Division
University of Edinburgh

tel: 0131 650 4985
email: ywan at ed.ac.uk

2032 Computing Services, JCMB
The King's Buildings,
Edinburgh, EH9 3JZ

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list