[GE users] Interactive jobs not starting

VS Ang vs_ang at yahoo.com
Fri Nov 30 21:22:14 GMT 2007


Yes, with the "patched" ssh with tight-integration. There are no firewalls on the cluster. Also, I tried using the patched "ssh" command to login to the node directly, and it works fine. Only when doing qrsh or qlogin it doesn't work. 

----- Original Message ----
From: Reuti <reuti at staff.uni-marburg.de>
To: users at gridengine.sunsource.net
Sent: Friday, November 30, 2007 8:50:18 AM
Subject: Re: [GE users] Interactive jobs not starting


Hi,

Am 30.11.2007 um 00:12 schrieb VS Ang:

> Hello,
>
> When I attempt to submit interactive jobs using qrsh or qlogin  
> commands, the job never starts. The "qrsh" command simply returns  
> after a while:

with the default rsh or your defined ssh (from your other post)? Any  
firewall active, which blocks certain ports?

-- Reuti


> $ qrsh -verbose
> Your job 45 ("QRLOGIN") has been submitted
> waiting for interactive job to be scheduled ...timeout (3 s)  
> expired while waiting on socket fd 4
>
> Could not start interactive job.
>
> Same thing happens with qlogin:
>
> $ qlogin -verbose
> Your job 46 ("QLOGIN") has been submitted
> waiting for interactive job to be scheduled ...timeout (4 s)  
> expired while waiting on socket fd 4
>
> Could not start interactive job.
>
> Also, in the messages of the compute nodes, I see the following  
> errors.
>
> 11/29/2007 16:01:26|execd|compute-1-5|E|shepherd of job 26.1 exited  
> with exit status = 9
> 11/29/2007 16:01:26|execd|compute-1-5|W|reaping job "26" ptf  
> complains: Job does not exist
> 11/29/2007 16:07:23|execd|compute-1-5|E|shepherd of job 27.1 exited  
> with exit status = 9
> 11/29/2007 16:07:23|execd|compute-1-5|W|reaping job "27" ptf  
> complains: Job does not exist
> 11/29/2007 16:44:37|execd|compute-1-5|W|reaping job "33" ptf  
> complains: Job does not exist
> 11/29/2007 17:57:05|execd|compute-1-5|E|shepherd of job 43.1 exited  
> with exit status = 11
> 11/29/2007 17:57:05|execd|compute-1-5|W|reaping job "43" ptf  
> complains: Job does not exist
> 11/29/2007 18:07:34|execd|compute-1-5|W|reaping job "46" ptf  
> complains: Job does not exist
>
>
> And, on the qmaster host:
>
> 11/29/2007 17:57:06|qmaster|admin|W|job 43.1 failed on host  
> compute-1-5.local general before job because: 11/29/2007 17:57:05  
> [0:18592]: can't open file /tmp/43.1.all.q/pid: No such file or  
> directory
> 11/29/2007 17:57:06|qmaster|admin|W|rescheduling job 43.1
> 11/29/2007 18:06:27|qmaster|admin|W|job 44.1 failed on host  
> compute-1-4.local assumedly after job because: job 44.1 died  
> through signal KILL (9)
> 11/29/2007 18:07:03|qmaster|admin|W|job 45.1 failed on host  
> compute-1-1.local assumedly after job because: job 45.1 died  
> through signal KILL (9)
> 11/29/2007 18:07:35|qmaster|admin|W|job 46.1 failed on host  
> compute-1-5.local assumedly after job because: job 46.1 died  
> through signal KILL (9)
>
> I am using SGE 6.1u2 (compiled out of sources). Any pointers will  
> be appreciated!
>
> Srihari

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net







More information about the gridengine-users mailing list