[GE users] SGE6.1 error

John_Tai John_Tai at smics.com
Mon Aug 13 08:56:40 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I have recently installed 6.1, but every job is terminated after a while. 
 
This is my job from qstat, started as "qrsh -v eda=$cmd -cwd -now n icfb":
 
    950 0.55500 icfb       johnt        r     08/13/2007 14:48:02  <mailto:layout.q at dsl46> layout.q at dsl46
 
Here is the message I get from the command line:
 
    error: error reading returncode of remote command
 
This is the qmaster messages:
 
    08/13/2007 15:03:34|qmaster|dsls11|W|job 950.1 failed on host dsl46 general before job because: 08/13/2007 15:03:31 [999:20475]: can't open file /tmp/950.1.layout.q/pid: Permission denied
 
This is the exec host messages:
 
    08/13/2007 15:03:31|execd|dsl46|E|shepherd of job 950.1 exited with exit status = 11
 
Looking at the qmaster messages, it seems that this happens every hour to the majority of jobs. It doesn't seem to be bound by user nor exec host. 
 
Hope somebody can help me. I had been using 6.0u7-1 for a long time without problems, but now that I changed qmaster server and installed the latest version, I keep getting this problem. 
 
Thanks for in advance for your time. 
 
John



More information about the gridengine-users mailing list