[GE users] eqw problem.

Beadles, Jeff jeff_beadles at mentor.com
Fri Sep 7 13:08:06 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

It means that the execution host couldn't see /home/.../my-script-folder
 
Run qacct -j job# to see what host the job failed on, and go check it to see what's wrong.
 
(Is it just me, but isn't it odd that qstat tells you everything you want to know except where the failed job tried to run?)
 
Jeff

________________________________

From: jiangfan shi [mailto:jiangfan.shi at gmail.com]
Sent: Thu 9/6/2007 9:47 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] eqw problem.


Again, my eqw problem was not solved after I tried to choose some queue to run. By qstat -j jobid, I got the following :


script_file:                /home/.../my.sh
error reason    1:          09/06/2007 23:41:25 [7026:7382]: error: can't chdir to /home/.../my-script-folder 

What is this error?  Anyone can help me? 

Thanks.

Jiangfan



On 9/6/07, Nicholas Senedzuk < nicholas.senedzuk at gmail.com <mailto:nicholas.senedzuk at gmail.com> > wrote: 

	What Eqw means is that the queue is in error state, thats the E, and is in queue wait, thats the qw. The jobs will retry them selfs after a certain amount of time if you have them configured to. What most likely is happening is that you have one system that you are having a problem with so when a job attempts to run on that system and errors out into the Eqw state another job is dispatched to run on the system. So when you rerun these jobs they end up running on another host that is not having a problem and going into r state. 
	
	So what I would recommend doing to finding the system/systems that are having the problem and disable that node and then run all the jobs and see what happens. If no jobs go into Eqw state then you found the problem and you just need to find out why the jobs are not running on that node correctly. 
	
	
	
	
	On 9/6/07, jiangfan shi < jiangfan.shi at gmail.com <mailto:jiangfan.shi at gmail.com> > wrote: 

		Hi,
		
		I have a error of "eqw" when I use qstat to see the status of jobs. Some jobs are successfully going into "r" state, but some into "eqw" state. And when I run those jobs again, sometimes all jobs are going into "r" state, but most time there are always 3 or 8 going into "eqw" state. 
		
		For the ex.out log information, I got the following:
		
		/bin/bash: /root/.bashrc: Permission denied 
		/home/grad/jfshi/sandbox 
		/threshold/mini-threshold/maetg: error while loading shared libraries: libstdc++.so.6: cannot open sh 
		ared object file: No such file or directory
		
		

		Originally I used the V flag with qsub to resolve such problem. It worked at that time. But now it gave me the "eqw" problem. 
		
		 The following is the jobs information: 
		
		201036 0.00000 reuse-mini jfshi        Eqw   09/03/2007 21:28:30                       
		              1 
		201044 0.00000 reuse-mini jfshi        Eqw   09/03/2007 21:28:31 
		
		Anyone can tell me the solution? 
		
		Thanks.
		
		Jiangfan






More information about the gridengine-users mailing list