[GE users] eqw problem.

Lydia Heck lydia.heck at durham.ac.uk
Fri Sep 7 09:44:07 BST 2007



I get such an error typically if the directory does not exist or is not
accessible on the system which tries to run the job.
Could it be that the system on which the job is started does not see
the /home/.../my-script-folder  ?

If you can ssh or rsh into the system on which the job failed and see
if, as the user, you can see that directory.

Lydia



On Thu, 6 Sep 2007, jiangfan shi wrote:

> Again, my eqw problem was not solved after I tried to choose some queue to
> run. By qstat -j jobid, I got the following :
>
>
> script_file:                /home/.../my.sh
> error reason    1:          09/06/2007 23:41:25 [7026:7382]: error: can't
> chdir to /home/.../my-script-folder
>
> What is this error?  Anyone can help me?
>
> Thanks.
>
> Jiangfan
>
>
> On 9/6/07, Nicholas Senedzuk <nicholas.senedzuk at gmail.com> wrote:
> >
> > What Eqw means is that the queue is in error state, thats the E, and is in
> > queue wait, thats the qw. The jobs will retry them selfs after a certain
> > amount of time if you have them configured to. What most likely is happening
> > is that you have one system that you are having a problem with so when a job
> > attempts to run on that system and errors out into the Eqw state another job
> > is dispatched to run on the system. So when you rerun these jobs they end up
> > running on another host that is not having a problem and going into r state.
> >
> >
> > So what I would recommend doing to finding the system/systems that are
> > having the problem and disable that node and then run all the jobs and see
> > what happens. If no jobs go into Eqw state then you found the problem and
> > you just need to find out why the jobs are not running on that node
> > correctly.
> >
> >
> > On 9/6/07, jiangfan shi <jiangfan.shi at gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > I have a error of "eqw" when I use qstat to see the status of jobs. Some
> > > jobs are successfully going into "r" state, but some into "eqw" state. And
> > > when I run those jobs again, sometimes all jobs are going into "r" state,
> > > but most time there are always 3 or 8 going into "eqw" state.
> > >
> > > For the ex.out log information, I got the following:
> > >
> > > */bin/bash: /root/*.bashrc: Permission denied
> > > /home/grad/jfshi/sandbox/threshold/mini-threshold/maetg: error while
> > > loading shared libraries: libstdc++.so.6: cannot open sh
> > > ared object file: No such file or directory
> > >
> > >
> > > Originally I used the V flag with qsub to resolve such problem. It
> > > worked at that time. But now it gave me the "eqw" problem.
> > >
> > >  The following is the jobs information:
> > >
> > > 201036 0.00000 reuse-mini jfshi        Eqw   09/03/2007
> > > 21:28:30                                    1
> > > 201044 0.00000 reuse-mini jfshi        Eqw   09/03/2007 21:28:31
> > >
> > > Anyone can tell me the solution?
> > >
> > > Thanks.
> > >
> > > Jiangfan
> > >
> >
> >
>

------------------------------------------
Dr E L  Heck

University of Durham
Institute for Computational Cosmology
Ogden Centre
Department of Physics
South Road

DURHAM, DH1 3LE
United Kingdom

e-mail: lydia.heck at durham.ac.uk

Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645
___________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list