[GE users] Cant read usage file error

Chris Dagdigian dag at sonsorol.org
Mon Nov 8 21:27:58 GMT 2004


Things to check of the top of my head...

1. Are you using full paths? ("qsub $SGE_ROOT/examples/jobs/simple.sh") 
so that the node that runs the job can find the script? If you are 
running it from your home (shared) directory you can use the "-cwd" 
switch to tell SGE to assume current working directory ("qsub -cwd 
./simple.sh")

2. What does "qstat <jobID>" tell you about the job itself? Any obvious 
error messages?

3. In general how is the health of Grid Engine? Are all your queues 
running ("qstat -f" - any entry in the States: column could  potentially 
be bad)

4. Can you run an even simpler job? ("qrsh hostname")

Regards,
Chris



Paul Mitchell wrote:

> Hello,
>  First posting here. I've been trying to figure out why the following is
> occurring. I'm submitting sthe simple.sh example from my client
> (bp08.isis.unc.edu) and I get the following in the master
> (bp01.isis.unc.edu) messages file:
> 
> 11/08/2004 13:20:28|qmaster|bp01|W|rescheduling
> job 1.1
> 11/08/2004 13:23:58|qmaster|bp01|W|job 2.1 failed on host
> bp08.isis.unc.edu general opening input/output file because: can't read
> usage file for job 2.1
> 11/08/2004 13:23:58|qmaster|bp01|W|rescheduling job 2.1
> 11/08/2004 14:15:29|qmaster|bp01|W|job 4.1 failed on host
> bp08.isis.unc.edu general opening input/output file because: can't read
> usage file for job 4.1
> 11/08/2004 14:15:29|qmaster|bp01|W|rescheduling job 4.1
> 
> I've been searching through the archives and find a couple of cases where
> this error occurs.  Consequently, I've checked and my gridengine directory
> (and sub-directories) are owned (for the most part) by a local account
> "grid" on both machines, and the gid and uid of this account is identicale
> on both machines.  Furthermore, my personal uid and gid matches.
> 
> A copy of the script actually makes it's way over to the masters
> spool/job_scripts directory, so I know that's working. What intermediate
> process/file would be different between these two machines to cause this?
> 
> The machines, BTW, or Xservers, running OS X 10.3.5, and I"m trying to
> work with the Grid Engine 6.
> 
> Thanks for any pointers,
> 
> Paul Mitchell
> ==============================================================================
> 	Paul Mitchell
> 	email: pmitchel at email.unc.edu
> 	phone: (919) 962-9778
> 	office: I have an office, room 14, Phillips Hall
> ==============================================================================
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

-- 
Chris Dagdigian, <dag at sonsorol.org>
BioTeam  - Independent life science IT & informatics consulting
Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E iChat/AIM: bioteamdag  Web: http://bioteam.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list