[GE users] Fwd: qlogin - waiting on socket fd 4
margaret_Doll at brown.edu
Tue Apr 27 16:50:48 BST 2010
The problem of the queue manager not working and the earlier memory
loss on the head node was caused by one of our users queuing up 30,000
I deleted the jobs, rebooted the head node and everything seems to be
Begin forwarded message:
> From: Margaret Doll <Margaret_Doll at brown.edu>
> Date: April 27, 2010 9:25:05 AM EDT
> To: Grid Engine <users at gridengine.sunsource.net>, ROCKS <npaci-rocks-discussion at sdsc.edu
> Subject: Fwd: qlogin - waiting on socket fd 4
> Looking at qw jobs with qmon, the jobs are being held because
> "Could not get scheduling info"
> Begin forwarded message:
>> From: Margaret Doll <margaret_doll at brown.edu>
>> Date: April 27, 2010 8:48:02 AM EDT
>> To: Grid Engine <users at gridengine.sunsource.net>, ROCKS <npaci-rocks-discussion at sdsc.edu
>> Subject: qlogin - waiting on socket fd 4
>> I am using Rocks 5.2, Red Hat 2.6.18-53.1.14.el5 and SGE 6.1u4
>> Yesterday, I rebooted the head node as it was in a state where the
>> memory was locked up.
>> I did not reboot the compute nodes.
>> cluster-fork date showed all the compute nodes as active
>> We are now having problems with the queues working
>> -bash-3.1$ qlogin -pe queue2 1
>> Your job 228189 ("QLOGIN") has been submitted
>> waiting for interactive job to be scheduled ...timeout (3 s)
>> expired while waiting on socket fd 4
>> .error: error waiting on socket for client to connect: Interrupted
>> system call
>> After receiving this error I did reboot the compute nodes that are
>> used by queue2. The reboot made no difference.
>> What's wrong? How do I fix the problem?
>> Thanks for your help
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users