[GE users] eqw problem.
jiangfan.shi at gmail.com
Thu Sep 6 16:40:43 BST 2007
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Nicholas, Thanks for your suggestion. Now I found out that some nodes are
good for my program, so I use -q to direct all my jobs to those nodes.
On 9/6/07, Nicholas Senedzuk <nicholas.senedzuk at gmail.com> wrote:
> What Eqw means is that the queue is in error state, thats the E, and is in
> queue wait, thats the qw. The jobs will retry them selfs after a certain
> amount of time if you have them configured to. What most likely is happening
> is that you have one system that you are having a problem with so when a job
> attempts to run on that system and errors out into the Eqw state another job
> is dispatched to run on the system. So when you rerun these jobs they end up
> running on another host that is not having a problem and going into r state.
> So what I would recommend doing to finding the system/systems that are
> having the problem and disable that node and then run all the jobs and see
> what happens. If no jobs go into Eqw state then you found the problem and
> you just need to find out why the jobs are not running on that node
> On 9/6/07, jiangfan shi <jiangfan.shi at gmail.com> wrote:
> > Hi,
> > I have a error of "eqw" when I use qstat to see the status of jobs. Some
> > jobs are successfully going into "r" state, but some into "eqw" state. And
> > when I run those jobs again, sometimes all jobs are going into "r" state,
> > but most time there are always 3 or 8 going into "eqw" state.
> > For the ex.out log information, I got the following:
> > */bin/bash: /root/*.bashrc: Permission denied
> > /home/grad/jfshi/sandbox/threshold/mini-threshold/maetg: error while
> > loading shared libraries: libstdc++.so.6: cannot open sh
> > ared object file: No such file or directory
> > Originally I used the V flag with qsub to resolve such problem. It
> > worked at that time. But now it gave me the "eqw" problem.
> > The following is the jobs information:
> > 201036 0.00000 reuse-mini jfshi Eqw 09/03/2007
> > 21:28:30 1
> > 201044 0.00000 reuse-mini jfshi Eqw 09/03/2007 21:28:31
> > Anyone can tell me the solution?
> > Thanks.
> > Jiangfan
More information about the gridengine-users