[GE users] weird qlogin problem on cluster

Reuti reuti at staff.uni-marburg.de
Sat Nov 10 14:07:43 GMT 2007


Am 08.11.2007 um 14:54 schrieb Yuan Wan:

>> Am 08.11.2007 um 13:00 schrieb Yuan Wan:
>>> I hope someone can give some hint on my qlogin problem.
>>> Our cluster has two frontend nodes which are allowed to qlogin to  
>>> interactive work nodes. The qlogin was totally fine.
>>> But I found yesterday that one of the login node (frontend02) has  
>>> problem doing qlogin: The qlogin wrapper not be called after  
>>> scheduler allocating slot on work node. So the qlogin procedure  
>>> just halt there with message
>>> 'timeout (3 s) expired while waiting on socket fd 4'. But another  
>>> frontend node works exactly fine with qlogin.
>> so it was working before also with this node?
>> It could be, that the local file system (i.e. /var or /tmp) is  
>> full; or the node lost the NFS mount for any reason.
>> -- Reuti
> the local file system is quite empty, and shared file system is fine.
> --Yuan

ah, sorry. So the problem is on a submit host, not in the cluster. On  
this frontend02 the telnet program is available and no firewall is  
active? The qrsh_wrapper is also accessible at /usr/local/Cluster- 
Apps/sge/cvos - it's mounted there?

-- Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list