[GE users] Cant read usage file error

Robert Olson olson at mcs.anl.gov
Tue Nov 9 15:27:53 GMT 2004


I was getting those errors when filestaging wasn't set up properly; I 
think it was related to the app exiting due to missing input, and the 
usage not being written as a result.

On Tue, 9 Nov 2004, Paul Mitchell wrote:

> On Mon, 8 Nov 2004, Chris Dagdigian wrote:
> 
> >
> > Things to check of the top of my head...
> >
> > 1. Are you using full paths? ("qsub $SGE_ROOT/examples/jobs/simple.sh")
> > so that the node that runs the job can find the script? If you are
> > running it from your home (shared) directory you can use the "-cwd"
> > switch to tell SGE to assume current working directory ("qsub -cwd
> > ./simple.sh")
> 
> Using full paths made absolutely no difference.  I was running the command
> from the /usr/gridengine/examples/jobs directory.
> 
> >
> > 2. What does "qstat <jobID>" tell you about the job itself? Any obvious
> > error messages?
> 
> $ qstat
> job-ID  prior   name       user         state submit/start at     queue
> slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
>       1 0.00000 simple.sh  pmitchel     Eqw   11/08/2004 13:20:23
> 1
>       2 0.00000 simple.sh  pmitchel     Eqw   11/08/2004 13:23:47
> 1
>       4 0.00000 test1      pmitchel     Eqw   11/08/2004 14:15:24
> 1
>       5 0.00000 simple.sh  pmitchel     Eqw   11/09/2004 09:58:45
> 1
> 
> > 3. In general how is the health of Grid Engine? Are all your queues
> > running ("qstat -f" - any entry in the States: column could  potentially
> > be bad)
> 
> $ qstat -f
> queuename                      qtype used/tot. load_avg arch
> states
> ----------------------------------------------------------------------------
> all.q at bp01.isis.unc.edu        BIP   0/2       -NA-     darwin        au
> ----------------------------------------------------------------------------
> all.q at bp08.isis.unc.edu        BIP   0/2       0.02     darwin
> 
> ############################################################################
>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING
> JOBS
> ############################################################################
>       1 0.00000 simple.sh  pmitchel     Eqw   11/08/2004 13:20:23     1
>       2 0.00000 simple.sh  pmitchel     Eqw   11/08/2004 13:23:47     1
>       4 0.00000 test1      pmitchel     Eqw   11/08/2004 14:15:24     1
>       5 0.00000 simple.sh  pmitchel     Eqw   11/09/2004 09:58:45     1
> 
> 
> > 4. Can you run an even simpler job? ("qrsh hostname")
> 
> here's where it gets interesting.  MY simple job hung:
> 
>  qrsh hostname
> rcmd: socket: Operation not permitted
> 
> and when I Control-C'd it:
> 
> ^Cerror: error waiting on socket for client to connect: Interrupted system
> call
> error: error reading returncode of remote command
> 
> This leads me to believe that I've got the remote shell apparatus turned
> off.  I'm a long-time solaris adminstrator and a short time OS X admin. I
> believe that the following is important:
> 
> cd xinetd.d
> /etc/xinetd.d
> 
> $ grep rsh *
> shell:  server          = /usr/libexec/rshd
> 
> $ ps -auxww | grep rshd
> pmitchel  7592   0.0 -0.0    18172    344 std  S+   10:18AM   0:00.01 grep
> rshd
> 
> $ more shell
> service shell
> {
>         disable         = yes
>         socket_type     = stream
>         wait            = no
>         user            = root
>         server          = /usr/libexec/rshd
>         groups          = yes
>         flags           = REUSE
> }
> 
> I've probably got rsh disabled.
> 
> Note to Reuti:
> 
> You changed the entries in $SGE_ROOT/default/common/configuration and
> restarted all the daemons?
> 
> I can find no file under /usr/gridengine, either on master or client, by
> the name "configuration".
> 
> Paul Mitchell
> 
> ==============================================================================
> 	Paul Mitchell
> 	email: pmitchel at email.unc.edu
> 	phone: (919) 962-9778
> 	office: I have an office, room 14, Phillips Hall
> ==============================================================================
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list