[GE users] Cant read usage file error

Paul Mitchell pmitchel at email.unc.edu
Tue Nov 9 15:23:08 GMT 2004


On Mon, 8 Nov 2004, Chris Dagdigian wrote:

>
> Things to check of the top of my head...
>
> 1. Are you using full paths? ("qsub $SGE_ROOT/examples/jobs/simple.sh")
> so that the node that runs the job can find the script? If you are
> running it from your home (shared) directory you can use the "-cwd"
> switch to tell SGE to assume current working directory ("qsub -cwd
> ./simple.sh")

Using full paths made absolutely no difference.  I was running the command
from the /usr/gridengine/examples/jobs directory.

>
> 2. What does "qstat <jobID>" tell you about the job itself? Any obvious
> error messages?

$ qstat
job-ID  prior   name       user         state submit/start at     queue
slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      1 0.00000 simple.sh  pmitchel     Eqw   11/08/2004 13:20:23
1
      2 0.00000 simple.sh  pmitchel     Eqw   11/08/2004 13:23:47
1
      4 0.00000 test1      pmitchel     Eqw   11/08/2004 14:15:24
1
      5 0.00000 simple.sh  pmitchel     Eqw   11/09/2004 09:58:45
1

> 3. In general how is the health of Grid Engine? Are all your queues
> running ("qstat -f" - any entry in the States: column could  potentially
> be bad)

$ qstat -f
queuename                      qtype used/tot. load_avg arch
states
----------------------------------------------------------------------------
all.q at bp01.isis.unc.edu        BIP   0/2       -NA-     darwin        au
----------------------------------------------------------------------------
all.q at bp08.isis.unc.edu        BIP   0/2       0.02     darwin

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING
JOBS
############################################################################
      1 0.00000 simple.sh  pmitchel     Eqw   11/08/2004 13:20:23     1
      2 0.00000 simple.sh  pmitchel     Eqw   11/08/2004 13:23:47     1
      4 0.00000 test1      pmitchel     Eqw   11/08/2004 14:15:24     1
      5 0.00000 simple.sh  pmitchel     Eqw   11/09/2004 09:58:45     1


> 4. Can you run an even simpler job? ("qrsh hostname")

here's where it gets interesting.  MY simple job hung:

 qrsh hostname
rcmd: socket: Operation not permitted

and when I Control-C'd it:

^Cerror: error waiting on socket for client to connect: Interrupted system
call
error: error reading returncode of remote command

This leads me to believe that I've got the remote shell apparatus turned
off.  I'm a long-time solaris adminstrator and a short time OS X admin. I
believe that the following is important:

cd xinetd.d
/etc/xinetd.d

$ grep rsh *
shell:  server          = /usr/libexec/rshd

$ ps -auxww | grep rshd
pmitchel  7592   0.0 -0.0    18172    344 std  S+   10:18AM   0:00.01 grep
rshd

$ more shell
service shell
{
        disable         = yes
        socket_type     = stream
        wait            = no
        user            = root
        server          = /usr/libexec/rshd
        groups          = yes
        flags           = REUSE
}

I've probably got rsh disabled.

Note to Reuti:

You changed the entries in $SGE_ROOT/default/common/configuration and
restarted all the daemons?

I can find no file under /usr/gridengine, either on master or client, by
the name "configuration".

Paul Mitchell

==============================================================================
	Paul Mitchell
	email: pmitchel at email.unc.edu
	phone: (919) 962-9778
	office: I have an office, room 14, Phillips Hall
==============================================================================



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list