[GE users] Re: new error messages on large # job submissions

Patrice Seyed apseyed at bu.edu
Sun Aug 8 02:16:40 BST 2004


Reuti,

There is a spool on each client located in
/opt/gridengine/default/spool/<hostname> and in this dir:
active_jobs execd.pid jobs jobs_scripts messages

Once again I think it may be a "load" issue on the master node. Identical
jobs to these were able to complete successfully. (Will be investigating
soon option of upgrading SGE further but wanted to see what this error
message corresponds to.)

-Patrice

On Thu, 5 Aug 2004, Reuti wrote:

> Hi again,
>
> >So I submitted about 10,000 jobs that do a simple iteration with find and
> >dd that seemed to be going well and completing, besides the fact they
> >would heavily slow down the nfs server. I've now checked on these a few
> >days later and I see all the jobs are now in state "Eqw" and the following
> >message is included scrolled in the qmaster messages file:
>
> do you have the spool directories on the the master for all the nodes or on
> each node in something like /var/spool/sge? There is a HowTo at Sunsource to
> minimize the NFS traffic of SGE and have the spool directories local on the
> nodes.
>
>
> >3.q" that was not supposed to be there - killing
> >Wed Aug  4 04:04:20 2004|qmaster|linga|E|execd at compute-9-3.local reports
> >running job (21361.1/master) in queue "compute-9-3.q" that was not
> >supposed to be there - killing
>
> Yes, I also see this from time to time. Shut down the execd on the node, and
> remove all the remaining stuff in the /var/spool/sge/<nodename>/active_jobs,
> job_scripts and jobs (if it's a central spool directory
> $SGE_ROOT/default/spool/... of course). Then restart the execd and it shouldn't
> appear any longer.
>
>
> Cheers - Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list