[GE users] weird qlogin problem on cluster

Reuti reuti at staff.uni-marburg.de
Thu Nov 8 12:55:15 GMT 2007


Hi,

Am 08.11.2007 um 13:00 schrieb Yuan Wan:

> I hope someone can give some hint on my qlogin problem.
>
> Our cluster has two frontend nodes which are allowed to qlogin to  
> interactive work nodes. The qlogin was totally fine.
>
> But I found yesterday that one of the login node (frontend02) has  
> problem doing qlogin: The qlogin wrapper not be called after  
> scheduler allocating slot on work node. So the qlogin procedure  
> just halt there with message
> 'timeout (3 s) expired while waiting on socket fd 4'. But another  
> frontend node works exactly fine with qlogin.

so it was working before also with this node?

It could be, that the local file system (i.e. /var or /tmp) is full;  
or the node lost the NFS mount for any reason.

-- Reuti

> ---------------------------------------------------------------------- 
> -
> [ywan at frontend01 ~]$ qlogin
> local configuration frontend01.ecdf.ed.ac.uk not defined - using  
> global configuration
> Your job 4180551 ("QLOGIN") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 4180551 has been successfully scheduled.
> Establishing /usr/local/Cluster-Apps/sge/cvos/qlogin_wrapper  
> session to host node001.beowulf.cluster ...
> Last login: Thu Nov  8 11:01:03 2007 from frontend01.ecdf.ed.ac.uk
> [ywan at node001 ~]$
>
> ---------------------------------------------------------------------- 
> -
> [ywan at frontend02 ~]$ qlogin
> local configuration frontend02.ecdf.ed.ac.uk not defined - using  
> global configuration
> Your job 4180550 ("QLOGIN") has been submitted
> waiting for interactive job to be scheduled ...timeout (3 s)  
> expired while waiting on socket fd 4
>
> Your interactive job 4180550 has been successfully scheduled.
> timeout (5 s) expired while waiting on socket fd 4
>
> Your interactive job 4180550 has been successfully scheduled.
> timeout (3 s) expired while waiting on socket fd 4
>
> Your interactive job 4180550 has been successfully scheduled.
> timeout (3 s) expired while waiting on socket fd 4
>
> Your interactive job 4180550 has been successfully scheduled.
> timeout (3 s) expired while waiting on socket fd 4
>
> Your interactive job 4180550 has been successfully scheduled.
> timeout (4 s) expired while waiting on socket fd 4
> ...
> ---------------------------------------------------------------------- 
> -
>
> I did this test without firewall, and I also rebooted both of them  
> with same image. I even builded a new queue for testing. All these  
> test gave the same results as above.
>
>
> Anyone could know the possible reason?
>
>
> --Yuan
>
>
> Yuan Wan
> -- 
> Unix Section
> Information Services Infrastructure Division
> University of Edinburgh
>
> tel: 0131 650 4985
> email: ywan at ed.ac.uk
>
> 2032 Computing Services, JCMB
> The King's Buildings,
> Edinburgh, EH9 3JZ
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list