[GE users] ptf complains: Job does not exist

Reuti reuti at staff.uni-marburg.de
Sun Aug 13 22:57:59 BST 2006


Am 12.08.2006 um 02:47 schrieb Thiep Duong:

> Hi Reuti,
>
> user sitting on PC, connecting to 'svnc3' a SUN machine, running
> solaris_9 via Realvnc (client3.3.7)
> The svnc3 host is a submit host, and from there they are issuing qsh
> command.
>
> From the same submit host, user can get session from one queue
> but not the other.  After many tries ... user can sometime get
> some system on the same queue, but not consitent.
>
> Any suspicious on VNC or .xauthority that may prevent it to work?

Not at the moment.

> What I have noticed is that when things are working, an
> xterm session is coming up within 3-seconds.  When we have to
> wait for 5-6seconds, then it's doesn't work anymore.
> Our network is a mixture of 100Mb/1000MB, and we don't think
> there is any network problem.
>
> Strange problem, but it's not unique.  We found the problem
> on more than one user.

How many users are using qsh at the same time? If you use plain X11,  
they will all connect with a port >=6000. Is there any firewall  
limiting this range? Any additonal info in any of the /var/log files  
about a failed connection on the 'svnc3' or the client 'scblad02'?

-- Reuti


> Thanks.
>
> Thiep
>
> Reuti wrote:
>> Hi,
>> Am 11.08.2006 um 04:28 schrieb Thiep Duong:
>>> I am getting the above messages, and the last discussion on this
>>> issue was sometime in Aug/2005 without any conlusion/resolution.
>>>
>>> Let me try to describe it again:
>>> I have more than one user, who is using VNCserver window to submit
>>> interactive job
>>>     /opt/app/SGE/6.0/bin/sol-sparc64/qsh -q solaris16G.q at scblad02
>>>
>>> The messages user got back is either:
>>>
>>> waiting for interactive job to be scheduled ...
>>> Your interactive job 27807 has been successfully scheduled.
>>> Or
>>> waiting for interactive job to be scheduled ...
>>> Could not start interactive job.
>>>
>>>
>>> Without any xterm window come up. Look like there is no resource
>>> found.  Adding -now no switch so that we can see what's going on.
>>> qstat would show there is a job in qw mode (job-id 27807), I can do
>>> qstat -j 27807 for 3-5 seconds, then it's just gone.
>>>
>>> User actually got email telling him that his job is completed.
>>>
>>> It's not DISPLAY issue -- user can open qsh using other queue
>> is it working in one queue, but not the other on the same machine?
>> I don't get your configuration: your user is sitting as his PC,  
>> making a VNC connection to the master node of your cluster, and  
>> issuing there a qsh comand?
>> -- Reuti
>>>
>>> Nothing found in spool/qmaster/messages
>>>
>>> Looking at comon/accounting file, the job seems to finish
>>> solaris16G.q:scblad02:ccds:zeke:INTERACTIVE:27807:sge:0
>>> :1155234047:1155234047:1155234047:0:1:0:0:0:0.000000:0
>>> :0:0:0:0:0:0:0.000000:0:0:0:0:0:0:eagle:zeke:NONE
>>> :1:0:0.000000:0.000000:0.000000:-U zeke -q solaris16G.q at scblad02
>>> -l num_proc=1 -soft -l group=scdc -I y -P eagle:0.000000:NONE: 
>>> 0.000000
>>>
>>>
>>> Looking in spool/scblad02 (execution host)
>>> 08/10/2006 11:12:59|execd|scblad02|W|reaping job "27801" ptf  
>>> complains: Job does not exist
>>> 08/10/2006 11:19:30|execd|scblad02|W|reaping job "27805" ptf  
>>> complains: Job does not exist
>>> 08/10/2006 11:20:47|execd|scblad02|W|reaping job "27807" ptf  
>>> complains: Job does not exist
>>>
>>> It's not execution host, because other user can open/run job
>>> on the same queue --
>>>
>>> What can we do to debug futher?  I am using 6.0u7 release.
>>>
>>> Thanks in advance.
>>>
>>> Thiep
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list