[GE users] eqw problem.

Nicholas Senedzuk nicholas.senedzuk at gmail.com
Thu Sep 6 15:38:09 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

What Eqw means is that the queue is in error state, thats the E, and is in
queue wait, thats the qw. The jobs will retry them selfs after a certain
amount of time if you have them configured to. What most likely is happening
is that you have one system that you are having a problem with so when a job
attempts to run on that system and errors out into the Eqw state another job
is dispatched to run on the system. So when you rerun these jobs they end up
running on another host that is not having a problem and going into r state.


So what I would recommend doing to finding the system/systems that are
having the problem and disable that node and then run all the jobs and see
what happens. If no jobs go into Eqw state then you found the problem and
you just need to find out why the jobs are not running on that node
correctly.


On 9/6/07, jiangfan shi <jiangfan.shi at gmail.com> wrote:
>
> Hi,
>
> I have a error of "eqw" when I use qstat to see the status of jobs. Some
> jobs are successfully going into "r" state, but some into "eqw" state. And
> when I run those jobs again, sometimes all jobs are going into "r" state,
> but most time there are always 3 or 8 going into "eqw" state.
>
> For the ex.out log information, I got the following:
>
> */bin/bash: /root/*.bashrc: Permission denied
> /home/grad/jfshi/sandbox/threshold/mini-threshold/maetg: error while
> loading shared libraries: libstdc++.so.6: cannot open sh
> ared object file: No such file or directory
>
>
> Originally I used the V flag with qsub to resolve such problem. It worked
> at that time. But now it gave me the "eqw" problem.
>
>  The following is the jobs information:
>
> 201036 0.00000 reuse-mini jfshi        Eqw   09/03/2007
> 21:28:30                                    1
> 201044 0.00000 reuse-mini jfshi        Eqw   09/03/2007 21:28:31
>
> Anyone can tell me the solution?
>
> Thanks.
>
> Jiangfan
>



More information about the gridengine-users mailing list